Predictive Hacks

# ANOVA vs Multiple Comparisons When we run an ANOVA, we analyze the differences among group means in a sample. In its simplest form, ANOVA provides a statistical test of whether two or more population means are equal, and therefore generalizes the t-test beyond two means.

## ANOVA Null and Alternatve Hypothesis

The null hypothesis in ANOVA is that there is no difference between means and the alternative is that the means are not all equal.

$$H_0: \mu _1= \mu _2=…= \mu _K$$
$$H_1: The~ \mu_s~Are~Not~All~Equal$$

This means that when we are dealing with many groups, we cannot compare them pairwise. We can simply answer if the means between groups can be considered as equal or not.

## Tukey’s HSD

What about if we want to compare all the groups pairwise? In this case, we can apply the Tukey’s HSD which is a single-step multiple comparison procedure and statistical test. It can be used to find means that are significantly different from each other.

## Example of ANOVA vs Tukey’s HSD

Let’s assume that we are dealing with the following 4 groups:

• Group “a”: 100 observations from the Normal Distribution with mean 10 and standard deviation 5
• Group “b”: 100 observations from the Normal Distribution with mean 10 and standard deviation 5
• Group “c”: 100 observations from the Normal Distribution with mean 11 and standard deviation 6
• Group “d”: 100 observations from the Normal Distribution with mean 11 and standard deviation 6

Clearly, we were expecting the ANOVA to reject to Null Hypothesis but we would also to know that the Group a and Group b are not statistically different and the same with the Group c and Group d

Let’s work in R:

library(multcomp)
library(tidyverse)

# Create the four groups
set.seed(10)
df1 <- data.frame(Var="a", Value=rnorm(100,10,5))
df2 <- data.frame(Var="b", Value=rnorm(100,10,5))
df3 <- data.frame(Var="c", Value=rnorm(100,11,6))
df4 <- data.frame(Var="d", Value=rnorm(100,11,6))

# merge them in one data frame
df<-rbind(df1,df2,df3,df4)

# convert Var to a factor
df$Var<-as.factor(df$Var)

df%>%ggplot(aes(x=Value, fill=Var))+geom_density(alpha=0.5)



### ANOVA

# ANOVA
model1<-lm(Value~Var, data=df)
anova(model1)



Output:

Analysis of Variance Table

Response: Value
Df  Sum Sq Mean Sq F value    Pr(>F)
Var         3   565.7 188.565   6.351 0.0003257 ***
Residuals 396 11757.5  29.691
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Clearly, we reject the null hypothesis since the p-value is 0.0003257

### Tukey’s HSD

Let’s apply the Tukey HSD test to test all the means.

# Tukey multiple comparisons
summary(glht(model1, mcp(Var="Tukey")))



Output:

	 Simultaneous Tests for General Linear Hypotheses

Multiple Comparisons of Means: Tukey Contrasts

Fit: lm(formula = Value ~ Var, data = df)

Linear Hypotheses:
Estimate Std. Error t value Pr(>|t|)
b - a == 0   0.2079     0.7706   0.270  0.99312
c - a == 0   1.8553     0.7706   2.408  0.07727 .
d - a == 0   2.8758     0.7706   3.732  0.00129 **
c - b == 0   1.6473     0.7706   2.138  0.14298
d - b == 0   2.6678     0.7706   3.462  0.00329 **
d - c == 0   1.0205     0.7706   1.324  0.54795
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Adjusted p values reported -- single-step method)


As we can see from the output above, the difference between c vs a and c vs b found not to be statistically significant although they are from different distributions. The reason for that is the “issue” with the multiple comparisons. Let’s compare them by applying the t-test

### t-test a vs c

t.test(df%>%filter(Var=="a")%>%pull(), df%>%filter(Var=="c")%>%pull())


Output:

	Welch Two Sample t-test

data:  df %>% filter(Var == "a") %>% pull() and df %>% filter(Var == "c") %>% pull()
t = -2.4743, df = 189.47, p-value = 0.01423
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-3.3343125 -0.3761991
sample estimates:
mean of x mean of y
9.317255 11.172511


t-test b vs c

t.test(df%>%filter(Var=="b")%>%pull(), df%>%filter(Var=="c")%>%pull())



Output:

	Welch Two Sample t-test

data:  df %>% filter(Var == "b") %>% pull() and df %>% filter(Var == "c") %>% pull()
t = -2.1711, df = 191.53, p-value = 0.03115
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-3.1439117 -0.1507362
sample estimates:
mean of x mean of y
9.525187 11.172511


As we can see from above, the difference in means between the two groups, in both cases, found to be statistically significant, if we ignore the multiple comparisons.

## Discussion

When we are dealing with multiple comparisons and we want to apply pairwise comparisons, then Tukey’s HSD is a good option. Another approach is to consider the P-Value Adjustments.

You can also have a look at how you can consider the multiple comparisons in A/B/n Testing