In statistics, the two-way analysis of variance (ANOVA) is an extension of the one-way ANOVA that examines the influence of two different categorical independent variables on one continuous dependent variable. The two-way ANOVA not only aims at assessing the main effect of each independent variable but also if there is any interaction between them.
In 1925, Ronald Fisher mentions the two-way ANOVA in his celebrated book, Statistical Methods for Research Workers (chapters 7 and 8). In 1934, Frank Yates published procedures for the unbalanced case. Since then, an extensive literature has been produced. The topic was reviewed in 1993 by Yasunori Fujikoshi. In 2005, Andrew Gelman proposed a different approach of ANOVA, viewed as a multilevel model.
Let us imagine a data set for which a dependent variable may be influenced by two factors which are potential sources of variation. The first factor has levels () and the second has levels (). Each combination defines a treatment, for a total of treatments. We represent the number of replicates for treatment by , and let be the index of the replicate in this treatment ().
From these data, we can build a contingency table, where and , and the total number of replicates is equal to .
The experimental design is balanced if each treatment has the same number of replicates, . In such a case, the design is also said to be orthogonal, allowing to fully distinguish the effects of both factors. We hence can write , and .
Upon observing variation among all data points, for instance via a histogram, "probability may be used to describe such variation". Let us hence denote by the random variable which observed value is the -th measure for treatment . The two-way ANOVA models all these variables as varying independently and normally around a mean, , with a constant variance, (homoscedasticity):
Specifically, the mean of the response variable is modeled as a linear combination of the explanatory variables:
where is the grand mean, is the additive main effect of level from the first factor (i-th row in the contingency table), is the additive main effect of level from the second factor (j-th column in the contingency table) and is the non-additive interaction effect of treatment for samples from both factors (cell at row i and column j in the contingency table).