# Few interesting questions related to correlation

Correlation is one of the basic components of Statistics and we all know that “Correlation implies how two variables are moving together” and “Correlation doesn’t imply causation” but there are various hidden questions which we want to ask but we ignore them.

This article is a composition of such interesting questions, let see-

### Does zero correlation( Pearson ) implies independence?

What do you do when you see two variables having absolute 0 or near to 0 correlation? Dump it? but is it right way to do so?

Probably No!!

Zero correlation doesn’t imply independence rather it means

No Linear Relationship

Let’s see an example, assume I have following values for x = {-2, -1, 0, 1, 2} and y = x^{2} what will be the correlation between x and y? Yes, it will be zero but does it mean independence? NO!!

The reason being – Correlation is suited only when there is a linear relationship among the variables.

### If correlation doesn’t imply causation, should we not at all study it?

Highly correlated variables are very useful in making good predictions or building good predictive models. But problem arises when you want to explain the causation through your model result.

Let say if there is a city ‘A’ where children sick falling rate is higher. There is nothing wrong in this statement but when you want to assign causality to sick rate – Be careful.

### How does regression coefficient differ from correlation coefficient?

Correlation talks about the bounded relationship between two variables which can vary from – 1 to 1 whereas regression coefficient talks about steepness which can range from ∞− to +∞

Assume we are talking about simple linear regression,

Y_{i} = a + b_{1}X_{i}

And we know that beta coefficient can be calculated as:

And we can see that correlation and beta coefficient will only be equal if SD(Y) = SD(X_{i})

Or in other words correlation and Beta coefficient will be same if both Y and X are standardized first before applying the model building exercise.

### Can I apply Pearson or Spearman’s correlation on non-normal data?

Pearson’s or Spearman’s correlation coefficients don’t assume any normal distribution for given data. So yes, you can apply correlation coefficient calculations on non-normal data.

### Why does correlation between x and (x-y) is ~0.7? interesting!!!

Whenever we have 2 variables x and y, and you try to calculate Pearson correlation coefficient between x and (x-y) – surprisingly you will get ~0.7 correlation. If you don’t believe me, let’s run few iterations-

```
for(i in 1:10)
{
x <- rnorm(1000000, 10, 2)
y <- rnorm(1000000, 10, 2)
print(cor(x, x-y))
}
```

```
## [1] 0.70716
## [1] 0.7070132
## [1] 0.7070345
## [1] 0.7067792
## [1] 0.7071491
## [1] 0.7079251
## [1] 0.7066222
## [1] 0.7066606
## [1] 0.707397
## [1] 0.7062416
```

The answer to this is explained pretty well on stackexchange:

#### analyticsfreak

#### Latest posts by analyticsfreak (see all)

- Few interesting questions related to correlation - July 22, 2016
- How to make a reproducible example to share? - July 21, 2016
- Few random questions on Random Forest - July 20, 2016