跳转至

Analysis of Variance (ANOVA) (Statistics Notes)

That's a fantastic idea! A math demo will really help solidify the concepts we've discussed.

Let's walk through a simple ANOVA calculation step-by-step.

Scenario: Comparing Three Different Fertilizers on Plant Growth

Imagine a biologist wants to test if three different types of fertilizer (Fertilizer A, Fertilizer B, Fertilizer C) have a significant effect on the growth (measured in cm) of a certain plant species. They plant 3 plants for each fertilizer type and measure their growth after a month.

Our Data: | Fertilizer A | Fertilizer B | Fertilizer C | | :----------- | :----------- | :----------- | | 10 | 13 | 7 |
| 11 | 14 | 8 | | 12 | 15 | 9 |

Total number of observations (N) = 9 (3 plants for each of 3 fertilizers) Number of groups (k) = 3


Step 1: Calculate Group Means and the Grand Mean

  • Mean of Fertilizer A (x̄₁): (10 + 11 + 12) / 3 = 33 / 3 = 11
  • Mean of Fertilizer B (x̄₂): (13 + 14 + 15) / 3 = 42 / 3 = 14
  • Mean of Fertilizer C (x̄₃): (7 + 8 + 9) / 3 = 24 / 3 = 8
  • Grand Mean (GM or x̄_total): Sum of all observations / N
  • (10+11+12+13+14+15+7+8+9) / 9 = 99 / 9 = 11

Step 2: Calculate Sum of Squares Total (SST)

  • SST measures the total variability in all the data points, regardless of which group they belong to. It's the sum of the squared differences between each observation and the Grand Mean. $\(SST = \sum_{i=1}^{N} (x_i - GM)^2\)$
  • (10 - 11)² = (-1)² = 1
  • (11 - 11)² = (0)² = 0
  • (12 - 11)² = (1)² = 1
  • (13 - 11)² = (2)² = 4
  • (14 - 11)² = (3)² = 9
  • (15 - 11)² = (4)² = 16
  • (7 - 11)² = (-4)² = 16
  • (8 - 11)² = (-3)² = 9
  • (9 - 11)² = (-2)² = 4

  • SST = 1 + 0 + 1 + 4 + 9 + 16 + 16 + 9 + 4 = 60


Step 3: Calculate Sum of Squares Between Groups (SSB or SSTreatment)

SSB measures the variability between the group means. It shows how much the group means differ from the Grand Mean. $\(SSB = \sum_{j=1}^{k} n_j (x̄_j - GM)^2\)$ (where \(n_j\) is the number of observations in group \(j\), and \(x̄_j\) is the mean of group \(j\))

  • Fertilizer A: 3 * (11 - 11)² = 3 * 0² = 3 * 0 = 0
  • Fertilizer B: 3 * (14 - 11)² = 3 * 3² = 3 * 9 = 27
  • Fertilizer C: 3 * (8 - 11)² = 3 * (-3)² = 3 * 9 = 27

**SSB = 0 + 27 + 27 = 54

** ---

Step 4: Calculate Sum of Squares Within Groups (SSW or SSError)

SSW measures the variability within each group. It's the sum of the squared differences between each observation and its own group mean. This represents the "error" or unexplained variation. $\(SSW = \sum_{j=1}^{k} \sum_{i=1}^{n_j} (x_{ij} - x̄_j)^2\)$ * For Fertilizer A: * (10 - 11)² = (-1)² = 1 * (11 - 11)² = (0)² = 0 * (12 - 11)² = (1)² = 1 * Sum for A = 1 + 0 + 1 = 2 * For Fertilizer B: * (13 - 14)² = (-1)² = 1 * (14 - 14)² = (0)² = 0 * (15 - 14)² = (1)² = 1 * Sum for B = 1 + 0 + 1 = 2 * For Fertilizer C: * (7 - 8)² = (-1)² = 1 * (8 - 8)² = (0)² = 0 * (9 - 8)² = (1)² = 1 * Sum for C = 1 + 0 + 1 = 2 SSW = 2 + 2 + 2 = 6


Verification: SST = SSB + SSW

  • Does 60 = 54 + 6? Yes! This confirms our calculations for the sum of squares are correct. --

Step 5: Calculate Degrees of Freedom (df)

  • df Total (df_T): N - 1 = 9 - 1 = 8
  • df Between Groups (df_B): k - 1 = 3 - 1 = 2
  • df Within Groups (df_W): N - k = 9 - 3 = 6
  • Verification: df_T = df_B + df_W? Does 8 = 2 + 6? Yes!

Step 6: Calculate Mean Squares (MS)

Mean Squares are calculated by dividing the Sum of Squares by their respective Degrees of Freedom. * Mean Square Between Groups (MSB): $\(MSB = \frac{SSB}{df_B} = \frac{54}{2} = \mathbf{27}\)$ This is our estimate of the variance between the groups. * Mean Square Within Groups (MSW): $\(MSW = \frac{SSW}{df_W} = \frac{6}{6} = \mathbf{1}\)$ This is our estimate of the variance within the groups (our error variance). --- ### Step 7: Calculate the F-statistic The F-statistic is the ratio of the variance between groups to the variance within groups. $\(F = \frac{MSB}{MSW} = \frac{27}{1} = \mathbf{27}\)$ ---

Step 8: Formulate Hypotheses and Make a Conclusion

  • Null Hypothesis (H₀): There is no significant difference between the mean plant growths for the three fertilizers (\(μ_A = μ_B = μ_C\)).
  • Alternative Hypothesis (H₁): At least one fertilizer mean is significantly different from the others.
  • To make a conclusion, we compare our calculated F-statistic (F = 27) to a critical F-value from an F-distribution table. The critical F-value depends on our chosen significance level (α, commonly 0.05) and our degrees of freedom (\(df_B = 2\), \(df_W = 6\)).
  • Looking up an F-table for α = 0.05, df1 = 2, df2 = 6, the critical F-value is approximately 5.14.
  • Our Decision:

Since our calculated F-statistic (27) is much greater than the critical F-value (5.14), we reject the null hypothesis (H₀).

Conclusion: There is a statistically significant difference in the mean plant growth among the three types of fertilizers. This suggests that at least one fertilizer has a different effect on plant growth compared to the others.


Summary (ANOVA Table)

ANOVA results are often presented in a table like this: | Source of Variation | Sum of Squares (SS) | Degrees of Freedom (df) | Mean Squares (MS) | F-statistic | p-value (not calculated here) | | :------------------ | :------------------ | :---------------------- | :---------------- | :---------- | :---------------------------- | | Between Groups | 54 | 2 | 27 | 27 | < 0.05 (likely) | | Within Groups (Error)| 6 | 6 | 1 | | | |

Total | 60 | 8 | | | |


I hope this detailed mathematical demo helps you understand how the F-statistic is derived in ANOVA! Let me know if you'd like to explore any part of it further.