Predictive Hacks

# Rank Correlation with R Rank correlation is a measure of the relationship between the rankings of two variables or two rankings of the same variable. In this post, we will talk about the Spearman’s rho and Kendall’s tau coefficients.

Kendall’s tau correlation: It is a non-parametric test that measures the strength of dependence between two variables.  If we consider two samples, $$a$$ and $$b$$, where each sample size is  $$n$$ , we know that the total number of pairings with $$a~b$$ is $$n(n-1)/2$$   The following formula is used to calculate the value of Kendall rank correlation:

$$\tau=\frac{n_c-n_d}{n(n-1)/2}$$

Where:

$$n_c=$$ number of concordant, i.e. ordered in the same way.

$$n_d=$$ number of discordant, i.e. did not order in the same way.

Spearman’s rho correlation: It is a non-parametric test that is used to measure the degree of association between two variables.  The Spearman rank correlation test does not carry any assumptions about the distribution of the data and is the appropriate correlation analysis when the variables are measured on a scale that is at least ordinal.

The following formula is used to calculate the Spearman rank correlation:

$$\rho=1-\frac{6\sum d_i^2 }{n(n^2-1)}$$

where:

$$d_i=$$ the difference between the ranks of corresponding variables.

$$n=$$ the number of observations.

## Example Using R

Let’s generate 1000 observations from a multivariate normal distribution and represent the usual correlation Pearson's as well as the Rank Correlations such that the Kendall's and Spearman's

library(MASS)

# generate a bivariate normal distribution
# with a variance covariance matrix called Sigma and mean 0

Sigma <- matrix(c(9,3,3,2),2,2)

set.seed(5)
BiNorms<-mvrnorm(n = 1000, c(0,0), Sigma)

plot(BiNorms[,1], BiNorms[,2], main="Scatterplot of Two Normals",
xlab="Normal V1 ", ylab="Normal V2 ", pch=19)


data.frame(Coefficient=c("Pearson", "Kendall", "Spearman"),
Value=c(cor(BiNorms[,1], BiNorms[,2], method = "pearson"),
cor(BiNorms[,1], BiNorms[,2], method = "kendall"),
cor(BiNorms[,1], BiNorms[,2], method = "spearman")))

  Coefficient    Value
1     Pearson 0.7198969
2     Kendall 0.5202082
3    Spearman 0.7120486

As we can see, in this example the Spearman’s correlation was almost identical to Pearson’s, but the Kendall’s was much lower.

We can also do a Hypothesis testing in R for the correlation coefficient with a Null Hypothesis that there is no correlation, value is 0.

In the example below, we set as an alternative hypothesis the Kendall’s correlation to be positive.

cor.test(BiNorms[,1], BiNorms[,2], method="kendal", alternative = "greater")

data:  BiNorms[, 1] and BiNorms[, 2]
z = 24.633, p-value < 2.2e-16
alternative hypothesis: true tau is greater than 0
sample estimates:
tau
0.5202082 

## Kendall’s Tau – Interpretation

The τ is affected by the number of observations and it is difficult to apply rules of thumb for interpreting. However we can say that:

• τ=-1 indicates a perfect negative monotonous relation between two variables
• τ=0 indicates no monotonous relation between two variables
• τ=-1 indicates a perfect positive monotonous relation between two variables
• |t| between 0.07 and 0.2 indicates a weak association
• |t| between 0.21 and 0.35 indicates a medium association
• |t| between 0.36 and 1 indicates a strong association