ANOVA: Not Just for Stat Geeks! How This One Test Can Help you make informed decisions

Siddharth Kshirsagar
3 min readMar 15, 2023

--

Photo by MD_JERRY on Unsplash

ANOVA stands or analysis of variance. Its statistical method used to check if the means of two or more groups are significantly different from each other.

Code: kshirsagarsiddharth/Anova-Demo-Medium (github.com)

Let’s say you want to buy watermelons and you have 3 options farm1, farm2 or farm3, and you have to decide if there is any difference between weights of melons from these farms. Being a data geek, you have recorded the weights of each melon every time you purchased from a farm. You can use ANOVA to determine if there is a significant difference in weights of these melons.

For example, Mean weight from farm1 was 1.2Kg farm2 was 1.4Kg and farm3 was 0.9Kg you can use ANOVA to determine if there is a significant difference in weights. We can define hypothesis for this problem.

Null Hypothesis: There is no significant difference in mean weight of the fruit.
Alternative Hypothesis: There is a significant difference in mean weights.

Let’s look at it practically, I was looking to invest into some Mutual funds and after doing some research I found 3 potential funds I can invest into, but I had enough disposable cash to invest into only 1 fund. And to make a decision I used ANOVA. My question was “If the mean return of all 3 funds is same, I can invest into any of the fund.”

Let’s define a null and alternative hypothesis.

Null: There is no significant difference in mean return of 3 funds. 
Alternative: There is a significant difference in mean return of all 3 funds.

Let’s look at the data. I did some web scraping and extracted data for the three funds. As observed in the plot. (NAV — Adjusted Net Asset Value)

I have named the table as master_df

Let's plot the returns over time.

import plotly.express as px 
fig = px.line(df_master, x="Date", y="NAV", color='AMC', template='none')
fig.show()

Now let's use statsmodels API to get p-value for this data.

import pandas as pd 
import statsmodels.api as sm
from statsmodels.formula.api import ols

# Left should be the numerical vaue and right should be categories
model = ols('NAV ~ AMC', data=df_master).fit()
anova = sm.stats.anova_lm(model)
print(anova)

Let’s look at the result.

Over here P value is less than 0.05 hence we reject null hypothesis and there is a significant difference in mean returns of the three funds.

So, in conclusion ANOVA can be used to make informed decision if we want to compare means of 3 or more different groups.

--

--

No responses yet