Predictive Hacks

Rank Correlation with R

rank correlation

Rank correlation is a measure of the relationship between the rankings of two variables or two rankings of the same variable. In this post, we will talk about the Spearman’s rho and Kendall’s tau coefficients.

Kendall’s tau correlation: It is a non-parametric test that measures the strength of dependence between two variables.  If we consider two samples, \(a\) and \(b\), where each sample size is  \(n\) , we know that the total number of pairings with \(a~b\) is \(n(n-1)/2\)   The following formula is used to calculate the value of Kendall rank correlation:

\(\tau=\frac{n_c-n_d}{n(n-1)/2}\)

Where:

\(n_c=\) number of concordant, i.e. ordered in the same way.

\(n_d=\) number of discordant, i.e. did not order in the same way.

Spearman’s rho correlation: It is a non-parametric test that is used to measure the degree of association between two variables.  The Spearman rank correlation test does not carry any assumptions about the distribution of the data and is the appropriate correlation analysis when the variables are measured on a scale that is at least ordinal.

The following formula is used to calculate the Spearman rank correlation:

\(\rho=1-\frac{6\sum d_i^2 }{n(n^2-1)}\)

where:

\(d_i=\) the difference between the ranks of corresponding variables.

\(n=\) the number of observations.

Example Using R

Let’s generate 1000 observations from a multivariate normal distribution and represent the usual correlation Pearson's as well as the Rank Correlations such that the Kendall's and Spearman's

library(MASS)

# generate a bivariate normal distribution
# with a variance covariance matrix called Sigma and mean 0

Sigma <- matrix(c(9,3,3,2),2,2)

set.seed(5)
BiNorms<-mvrnorm(n = 1000, c(0,0), Sigma)

plot(BiNorms[,1], BiNorms[,2], main="Scatterplot of Two Normals", 
   xlab="Normal V1 ", ylab="Normal V2 ", pch=19)

Rank Correlation with R 1
data.frame(Coefficient=c("Pearson", "Kendall", "Spearman"),
Value=c(cor(BiNorms[,1], BiNorms[,2], method = "pearson"),
  cor(BiNorms[,1], BiNorms[,2], method = "kendall"),
  cor(BiNorms[,1], BiNorms[,2], method = "spearman")))
  Coefficient    Value
1     Pearson 0.7198969
2     Kendall 0.5202082
3    Spearman 0.7120486

As we can see, at this example the Spearman’s correlation was almost identical to Pearson’s, but the Kendall’s was much lower.

We can also do a Hypothesis testing in R for the correlation coefficient with a Null Hypothesis that there is no correlation, value is 0.

At the example below we set as an alternative hypotheis the Kendall’s correlation to be positive.

cor.test(BiNorms[,1], BiNorms[,2], method="kendal", alternative = "greater")
data:  BiNorms[, 1] and BiNorms[, 2]
z = 24.633, p-value < 2.2e-16
alternative hypothesis: true tau is greater than 0
sample estimates:
      tau 
0.5202082 

Share This Post

Share on facebook
Share on linkedin
Share on twitter
Share on email

Leave a Comment

Subscribe To Our Newsletter

Get updates and learn from the best

More To Explore