# Introduction to correlation

Introduction

Pearson’s correlation coefficient

Spearman’s correlation coefficient

Kendall’s correlation coefficient

Correlation is the most commonly used statistical technique to identify the relationship between two continuous variables.

It is a measure which tells the movement of two variables with respect to each other.

Correlation coefficient can range from -1 to 1, where 1 signifies that two variables move in an exactly same direction i.e., if one variable increase other also increases and if one variable decreases other variable will also decrease.

Following image(sourced from Wikipedia) shows various examples of correlation coefficients.

### Pearson’s Correlation Coefficient

Formula for calculating Pearson correlation coefficient is as follows:

Which explains the relationship between correlation and covariance. Correlation can be explained as normalized covariance. Correlation value will be confined between -1 to 1, while value of covariance will depend on the units of data.Correlation coefficient can also be written as follows:

The assumption behind the Pearson correlation coefficient is – There should be a linear relationship between two continuous variables.

Let’s synthesize some data and calculate correlation coefficient:

```
v1<-abs(rnorm(100))
v2<-log((v1*v1))
```

Where v1 is a normally distributed vector with 100 observations while v2 stores the square of v1.

Now, we know that as v1 increases v2 is bound to increase.

Let see what is the correlation coefficient between v1 & v2-

```
cor(v1,v2)
```

```
## [1] 0.8188019
```

Surprisingly, it is not equal to 1 (Opposite to what we had expected).

The reason behind this observation is that there is not a linear relationship between v1 and v2

Always – keep in mind “It is dangerous to trust only correlation coefficient to conclude relationship between two variables”

#### Properties of Pearson Correlation Coefficient

- Pearson’s correlation coefficient is symmetric in nature, i.e. cor(x,y) will be equal to cor(y,x)
- Pearson’s correlation coefficient is unaffected by linear transformations i.e. cor(x,y) will be equal to cor(10*x,y)

### Spearman’s correlation coefficient

In the example discussed above, we saw that though two variables were moving together but still the Pearson’s correlation coefficient was not 1.

To overcome this we could use Spearman’s correlation coefficient which checks the monotonic relationship between two continuous variables:

```
cor(v1,v2,method = "spearman")
```

```
## [1] 1
```

And you can see that it is giving the value of Spearman coefficient to be exactly 1.

Spearman is nothing but “Pearson correlation on ranked variables”. i.e. if you rank v1 and v2 in ascending order (or descending order) and calculate Pearson’s correlation – It will be same as of Spearman’s correlation coefficient.

### Kendall’s correlation coefficient

Kendall in 1938, suggested correlation coefficient for discrete variables. Kendall’s correlation coefficient measures the ordinal relationship between two discrete variables.

Unlike Spearmans’ coefficient, it checks whether ranked variables have same ranks or not.

Formula can be given as –

Let’s see an example.

First of all, let’s create vectors v3 & v4 which contains random integers between 0 and 100

```
v3<-abs(runif(100,0,100))
v4<-abs(runif(100,0,100))
```

Let’s see all three correlations

```
cor(v3,v4,method="pearson")
```

```
## [1] -0.02115707
```

```
cor(v3,v4,method="spearman")
```

```
## [1] -0.01327333
```

```
cor(v3,v4,method="kendall")
```

```
## [1] -0.009292929
```

This is where we’ll end this article with a few images which talk about

Correlation doesn’t mean Causation

#### analyticsfreak

#### Latest posts by analyticsfreak (see all)

- Few interesting questions related to correlation - July 22, 2016
- How to make a reproducible example to share? - July 21, 2016
- Few random questions on Random Forest - July 20, 2016

Thank you for your article.

I have a question :

Always – keep in mind “It is dangerous to trust only correlation coefficient to conclude relationship between two variables”. Which are the pther techniques or aspects that one could use along with correlation to identify relationships ?