Predictive Hacks

# How to run Chi-Square Test in Python We will provide a practical example of how we can run a Chi-Square Test in Python. Assume that we want to test if there is a statistically significant difference in Genders (M, F) population between Smokers and Non-Smokers. Let’s generate some sample data to work on it.

## Sample Data

```import pandas as pd
import numpy as np
from scipy.stats import chi2_contingency

import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

```
```df = pd.DataFrame({'Gender' : ['M', 'M', 'M', 'F', 'F'] * 10,
'isSmoker' : ['Smoker', 'Smoker', 'Non-Smpoker', 'Non-Smpoker', 'Smoker'] * 10
})

```
``````	Gender	isSmoker
0	M	Smoker
1	M	Smoker
2	M	Non-Smpoker
3	F	Non-Smpoker
4	F	Smoker
``````

## Contingency Table

To run the Chi-Square Test, the easiest way is to convert the data into a contingency table with frequencies. We will use the `crosstab` command from `pandas`.

```contigency= pd.crosstab(df['Gender'], df['isSmoker'])
contigency

```

Let’s say that we want to get the percentages by Gender (row)

```contigency_pct = pd.crosstab(df['Gender'], df['isSmoker'], normalize='index')
contigency_pct

```

If we want the percentages by column, then we should write normalize=’column’ and if we want the total percentage then we should write normalize=’all’

## Heatmaps

An easy way to see visually the contingency tables are the heatmaps.

```plt.figure(figsize=(12,8))
sns.heatmap(contigency, annot=True, cmap="YlGnBu")

```

## Chi-Square Test

Now that we have built the contingency table we can pass it to `chi2_contingency` function from the `scipy` package which returns the:

• chi2: The test statistic
• p: The p-value of the test
• dof: Degrees of freedom
• expected: The expected frequencies, based on the marginal sums of the table

```# Chi-square test of independence.
c, p, dof, expected = chi2_contingency(contigency)
p

```
``0.3767591178115821``

## Inference

The p-value is 37.67% which means that we do not reject the null hypothesis at 95% level of confidence. The null hypothesis was that `Smokers` and `Gender` are independent. In this example, the contingency table was 2×2. We could have applied z-test for proportions instead of Chi-Square test. Notice that the Chi-Square test can be extended to m x n contingency tables.

### 1 thought on “How to run Chi-Square Test in Python”

1. How to extend the process for m x n contingency tables?