Understanding the Differences and Applications of ANOVA, ANCOVA, MANOVA, and MANCOVA in Machine Learning

ANOVA: Tests if the average of one outcome differs between groups.
Example: Compare the accuracy of 3 ML models.

import pandas as pd

import statsmodels.api as sm

from statsmodels.formula.api import ols

 

data = pd.DataFrame({

    ‘model’: [‘A’, ‘A’, ‘B’, ‘B’, ‘C’, ‘C’],

    ‘accuracy’: [0.8, 0.82, 0.85, 0.86, 0.9, 0.91]

})

 

model = ols(‘accuracy ~ model’, data=data).fit()

print(sm.stats.anova_lm(model))



ANCOVA: Like ANOVA, but adjusts for extra factors (covariates).
Example: Compare model accuracy while controlling for training time.

data[‘train_time’] = [10, 11, 9, 10, 8, 9]

 

model = ols(‘accuracy ~ model + train_time’, data=data).fit()

print(sm.stats.anova_lm(model))



MANOVA: Tests differences in several outcomes at onceExample: Compare models on accuracy and precision together.

from statsmodels.multivariate.manova import MANOVA

 

data = pd.DataFrame({

    ‘model’: [‘A’, ‘A’, ‘B’, ‘B’, ‘C’, ‘C’],

    ‘accuracy’: [0.8, 0.82, 0.85, 0.86, 0.9, 0.91],

    ‘precision’: [0.7, 0.72, 0.75, 0.76, 0.8, 0.82]

})

 

manova = MANOVA.from_formula(‘accuracy + precision ~ model’, data=data)

print(manova.mv_test())



MANCOVA: MANOVA plus controlling for covariates.
Example: Compare models on accuracy and precision while controlling for training time.

data[‘train_time’] = [10, 11, 9, 10, 8, 9]

 

manova = MANOVA.from_formula(‘accuracy + precision ~ model + train_time’, data=data)

print(manova.mv_test())

 

Test

Definition

When to Use

Example (from earlier code)

Explanation of Example

ANOVA

Compare means of one dependent variable across groups

When comparing one outcome across different groups or models

Comparing accuracy of 3 ML models (accuracy ~ model)

Tests if average accuracy differs between models A, B, and C

ANCOVA

Like ANOVA but controls for one or more covariates

When you want to adjust the outcome for other variables

Comparing accuracy while controlling for training time (accuracy ~ model + train_time)

Checks if model accuracy differs between models after accounting for training time differences

MANOVA

Compare means of multiple dependent variables across groups

When comparing several related outcomes together

Comparing accuracy and precision across models (accuracy + precision ~ model)

Tests if model groups differ on both accuracy and precision together

MANCOVA

MANOVA with covariates included

When adjusting multiple outcomes for other influencing factors

Comparing accuracy and precision while controlling for training time (accuracy + precision ~ model + train_time)

Assesses differences in accuracy and precision between models after adjusting for training time