When we run an ANOVA, we analyze the differences among group means in a sample. In its simplest form, ANOVA provides a statistical test of whether two or more population means are equal, and therefore generalizes the t-test beyond two means.
ANOVA Null and Alternatve Hypothesis
The null hypothesis in ANOVA is that there is no difference between means and the alternative is that the means are not all equal.
\(H_0: \mu _1= \mu _2=…= \mu _K \)
\(H_1: The~ \mu_s~Are~Not~All~Equal\)
This means that when we are dealing with many groups, we cannot compare them pairwise. We can simply answer if the means between groups can be considered as equal or not.
What about if we want to compare all the groups pairwise? In this case, we can apply the Tukey’s HSD which is a single-step multiple comparison procedure and statistical test. It can be used to find means that are significantly different from each other.
Example of ANOVA vs Tukey’s HSD
Let’s assume that we are dealing with the following 4 groups:
- Group “a”: 100 observations from the Normal Distribution with mean 10 and standard deviation 5
- Group “b”: 100 observations from the Normal Distribution with mean 10 and standard deviation 5
- Group “c”: 100 observations from the Normal Distribution with mean 11 and standard deviation 6
- Group “d”: 100 observations from the Normal Distribution with mean 11 and standard deviation 6
Clearly, we were expecting the ANOVA to reject to Null Hypothesis but we would also to know that the Group a and Group b are not statistically different and the same with the Group c and Group d
Let’s work in R:
library(multcomp) library(tidyverse) # Create the four groups set.seed(10) df1 <- data.frame(Var="a", Value=rnorm(100,10,5)) df2 <- data.frame(Var="b", Value=rnorm(100,10,5)) df3 <- data.frame(Var="c", Value=rnorm(100,11,6)) df4 <- data.frame(Var="d", Value=rnorm(100,11,6)) # merge them in one data frame df<-rbind(df1,df2,df3,df4) # convert Var to a factor df$Var<-as.factor(df$Var) df%>%ggplot(aes(x=Value, fill=Var))+geom_density(alpha=0.5)
# ANOVA model1<-lm(Value~Var, data=df) anova(model1)
Analysis of Variance Table Response: Value Df Sum Sq Mean Sq F value Pr(>F) Var 3 565.7 188.565 6.351 0.0003257 *** Residuals 396 11757.5 29.691 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Clearly, we reject the null hypothesis since the p-value is 0.0003257
Let’s apply the Tukey HSD test to test all the means.
# Tukey multiple comparisons summary(glht(model1, mcp(Var="Tukey")))
Simultaneous Tests for General Linear Hypotheses Multiple Comparisons of Means: Tukey Contrasts Fit: lm(formula = Value ~ Var, data = df) Linear Hypotheses: Estimate Std. Error t value Pr(>|t|) b - a == 0 0.2079 0.7706 0.270 0.99312 c - a == 0 1.8553 0.7706 2.408 0.07727 . d - a == 0 2.8758 0.7706 3.732 0.00129 ** c - b == 0 1.6473 0.7706 2.138 0.14298 d - b == 0 2.6678 0.7706 3.462 0.00329 ** d - c == 0 1.0205 0.7706 1.324 0.54795 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Adjusted p values reported -- single-step method)
As we can see from the output above, the difference between c vs a and c vs b found not to be statistically significant although they are from different distributions. The reason for that is the “issue” with the
multiple comparisons. Let’s compare them by applying the
t-test a vs c
Welch Two Sample t-test data: df %>% filter(Var == "a") %>% pull() and df %>% filter(Var == "c") %>% pull() t = -2.4743, df = 189.47, p-value = 0.01423 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -3.3343125 -0.3761991 sample estimates: mean of x mean of y 9.317255 11.172511
t-test b vs c
Welch Two Sample t-test data: df %>% filter(Var == "b") %>% pull() and df %>% filter(Var == "c") %>% pull() t = -2.1711, df = 191.53, p-value = 0.03115 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -3.1439117 -0.1507362 sample estimates: mean of x mean of y 9.525187 11.172511
As we can see from above, the difference in means between the two groups, in both cases, found to be statistically significant, if we ignore the multiple comparisons.
When we are dealing with multiple comparisons and we want to apply pairwise comparisons, then Tukey’s HSD is a good option. Another approach is to consider the P-Value Adjustments.
You can also have a look at how you can consider the multiple comparisons in A/B/n Testing