Predictive Hacks

# Why Correlation is not enough!

It is quite common to communicate the Correlation between two variables in Data Analysis. However, we should always represent the scatter plot apart from just the correlation. The reason for that is because correlation is quite sensitive to outliers and it cannot also capture parabolic patterns. Hence, although a high correlation indicates a strong linear relationship between those two variables, we need to be cautious that this measure can be misleading.

A great example for this case is the Anscombe’s quartet which comprises four datasets that have nearly identical simple descriptive statistics, yet have very different distributions and appear very different when graphed.

Below you can find the four datasets.

 I II III IV x y x y x y x y 10.0 8.04 10.0 9.14 10.0 7.46 8.0 6.58 8.0 6.95 8.0 8.14 8.0 6.77 8.0 5.76 13.0 7.58 13.0 8.74 13.0 12.74 8.0 7.71 9.0 8.81 9.0 8.77 9.0 7.11 8.0 8.84 11.0 8.33 11.0 9.26 11.0 7.81 8.0 8.47 14.0 9.96 14.0 8.10 14.0 8.84 8.0 7.04 6.0 7.24 6.0 6.13 6.0 6.08 8.0 5.25 4.0 4.26 4.0 3.10 4.0 5.39 19.0 12.50 12.0 10.84 12.0 9.13 12.0 8.15 8.0 5.56 7.0 4.82 7.0 7.26 7.0 6.42 8.0 7.91 5.0 5.68 5.0 4.74 5.0 5.73 8.0 6.89

For all datasets:

PropertyValueAccuracy
Mean of x9exact
Sample variance of x11exact
Mean of y7.50to 2 decimal places
Sample variance of y4.125±0.003
Correlation between x and y0.816to 3 decimal places
Linear regression liney = 3.00 + 0.500xto 2 and 3 decimal places, respectively
Coefficient of determination of the linear regression0.67to 2 decimal places

But with totally different scatter plots!

### Get updates and learn from the best

Python

#### Creating Dynamic Forms with Streamlit: A Step-by-Step Guide

In this blog post, we’ll teach you how to create dynamic forms based on user input using Streamlit’s session state

Python

#### How to Connect Wikipedia with ChatGPT and LangChain

ChatGPT’s knowledge is limited to its training data, which has the cutoff year of 2021. This implies that we cannot