Predictive Hacks

# Why Correlation is not enough!

It is quite common to communicate the Correlation between two variables in Data Analysis. However, we should always represent the scatter plot apart from just the correlation. The reason for that is because correlation is quite sensitive to outliers and it cannot also capture parabolic patterns. Hence, although a high correlation indicates a strong linear relationship between those two variables, we need to be cautious that this measure can be misleading.

A great example for this case is the Anscombe’s quartet which comprises four datasets that have nearly identical simple descriptive statistics, yet have very different distributions and appear very different when graphed.

Below you can find the four datasets.

 I II III IV x y x y x y x y 10.0 8.04 10.0 9.14 10.0 7.46 8.0 6.58 8.0 6.95 8.0 8.14 8.0 6.77 8.0 5.76 13.0 7.58 13.0 8.74 13.0 12.74 8.0 7.71 9.0 8.81 9.0 8.77 9.0 7.11 8.0 8.84 11.0 8.33 11.0 9.26 11.0 7.81 8.0 8.47 14.0 9.96 14.0 8.10 14.0 8.84 8.0 7.04 6.0 7.24 6.0 6.13 6.0 6.08 8.0 5.25 4.0 4.26 4.0 3.10 4.0 5.39 19.0 12.50 12.0 10.84 12.0 9.13 12.0 8.15 8.0 5.56 7.0 4.82 7.0 7.26 7.0 6.42 8.0 7.91 5.0 5.68 5.0 4.74 5.0 5.73 8.0 6.89

For all datasets:

PropertyValueAccuracy
Mean of x9exact
Sample variance of x11exact
Mean of y7.50to 2 decimal places
Sample variance of y4.125±0.003
Correlation between x and y0.816to 3 decimal places
Linear regression liney = 3.00 + 0.500xto 2 and 3 decimal places, respectively
Coefficient of determination of the linear regression0.67to 2 decimal places

But with totally different scatter plots!

### Get updates and learn from the best

Python

#### Image Captioning with HuggingFace

Image captioning with AI is a fascinating application of artificial intelligence (AI) that involves generating textual descriptions for images automatically.

Python

#### Intro to Chatbots with HuggingFace

In this tutorial, we will show you how to use the Transformers library from HuggingFace to build chatbot pipelines. Let’s