Predictive Hacks

Rank Correlation with R

rank correlation

Rank correlation is a measure of the relationship between the rankings of two variables or two rankings of the same variable. In this post, we will talk about the Spearman’s rho and Kendall’s tau coefficients.

Kendall’s tau correlation: It is a non-parametric test that measures the strength of dependence between two variables.  If we consider two samples, \(a\) and \(b\), where each sample size is  \(n\) , we know that the total number of pairings with \(a~b\) is \(n(n-1)/2\)   The following formula is used to calculate the value of Kendall rank correlation:

\(\tau=\frac{n_c-n_d}{n(n-1)/2}\)

Where:

\(n_c=\) number of concordant, i.e. ordered in the same way.

\(n_d=\) number of discordant, i.e. did not order in the same way.

Spearman’s rho correlation: It is a non-parametric test that is used to measure the degree of association between two variables.  The Spearman rank correlation test does not carry any assumptions about the distribution of the data and is the appropriate correlation analysis when the variables are measured on a scale that is at least ordinal.

The following formula is used to calculate the Spearman rank correlation:

\(\rho=1-\frac{6\sum d_i^2 }{n(n^2-1)}\)

where:

\(d_i=\) the difference between the ranks of corresponding variables.

\(n=\) the number of observations.

Example Using R

Let’s generate 1000 observations from a multivariate normal distribution and represent the usual correlation Pearson's as well as the Rank Correlations such that the Kendall's and Spearman's

library(MASS)

# generate a bivariate normal distribution
# with a variance covariance matrix called Sigma and mean 0

Sigma <- matrix(c(9,3,3,2),2,2)

set.seed(5)
BiNorms<-mvrnorm(n = 1000, c(0,0), Sigma)

plot(BiNorms[,1], BiNorms[,2], main="Scatterplot of Two Normals", 
   xlab="Normal V1 ", ylab="Normal V2 ", pch=19)

data.frame(Coefficient=c("Pearson", "Kendall", "Spearman"),
Value=c(cor(BiNorms[,1], BiNorms[,2], method = "pearson"),
  cor(BiNorms[,1], BiNorms[,2], method = "kendall"),
  cor(BiNorms[,1], BiNorms[,2], method = "spearman")))
  Coefficient    Value
1     Pearson 0.7198969
2     Kendall 0.5202082
3    Spearman 0.7120486

As we can see, in this example the Spearman’s correlation was almost identical to Pearson’s, but the Kendall’s was much lower.

We can also do a Hypothesis testing in R for the correlation coefficient with a Null Hypothesis that there is no correlation, value is 0.

In the example below, we set as an alternative hypothesis the Kendall’s correlation to be positive.

cor.test(BiNorms[,1], BiNorms[,2], method="kendal", alternative = "greater")
data:  BiNorms[, 1] and BiNorms[, 2]
z = 24.633, p-value < 2.2e-16
alternative hypothesis: true tau is greater than 0
sample estimates:
      tau 
0.5202082 

Kendall’s Tau – Interpretation

The τ is affected by the number of observations and it is difficult to apply rules of thumb for interpreting. However we can say that:

  • τ=-1 indicates a perfect negative monotonous relation between two variables
  • τ=0 indicates no monotonous relation between two variables
  • τ=-1 indicates a perfect positive monotonous relation between two variables
  • |t| between 0.07 and 0.2 indicates a weak association
  • |t| between 0.21 and 0.35 indicates a medium association
  • |t| between 0.36 and 1 indicates a strong association

Share This Post

Share on facebook
Share on linkedin
Share on twitter
Share on email

Leave a Comment

Subscribe To Our Newsletter

Get updates and learn from the best

More To Explore

Python

Image Captioning with HuggingFace

Image captioning with AI is a fascinating application of artificial intelligence (AI) that involves generating textual descriptions for images automatically.

Python

Intro to Chatbots with HuggingFace

In this tutorial, we will show you how to use the Transformers library from HuggingFace to build chatbot pipelines. Let’s