Predictive Hacks

Why Correlation is not enough!

correlation

It is quite common to communicate the Correlation between two variables in Data Analysis. However, we should always represent the scatter plot apart from just the correlation. The reason for that is because correlation is quite sensitive to outliers and it cannot also capture parabolic patterns. Hence, although a high correlation indicates a strong linear relationship between those two variables, we need to be cautious that this measure can be misleading.

A great example for this case is the Anscombe’s quartet which comprises four datasets that have nearly identical simple descriptive statistics, yet have very different distributions and appear very different when graphed.

Below you can find the four datasets.

IIIIIIIV
xyxyxyxy
10.08.0410.09.1410.07.468.06.58
8.06.958.08.148.06.778.05.76
13.07.5813.08.7413.012.748.07.71
9.08.819.08.779.07.118.08.84
11.08.3311.09.2611.07.818.08.47
14.09.9614.08.1014.08.848.07.04
6.07.246.06.136.06.088.05.25
4.04.264.03.104.05.3919.012.50
12.010.8412.09.1312.08.158.05.56
7.04.827.07.267.06.428.07.91
5.05.685.04.745.05.738.06.89

For all datasets:

PropertyValueAccuracy
Mean of x9exact
Sample variance of x11exact
Mean of y7.50to 2 decimal places
Sample variance of y4.125±0.003
Correlation between x and y0.816to 3 decimal places
Linear regression liney = 3.00 + 0.500xto 2 and 3 decimal places, respectively
Coefficient of determination of the linear regression0.67to 2 decimal places

But with totally different scatter plots!

Share This Post

Share on facebook
Share on linkedin
Share on twitter
Share on email

Leave a Comment

Subscribe To Our Newsletter

Get updates and learn from the best

More To Explore

Python

Image Captioning with HuggingFace

Image captioning with AI is a fascinating application of artificial intelligence (AI) that involves generating textual descriptions for images automatically.

Python

Intro to Chatbots with HuggingFace

In this tutorial, we will show you how to use the Transformers library from HuggingFace to build chatbot pipelines. Let’s