Predictive Hacks

How to Deal With NAs in Pandas Group By Operation

When we are running an Exploratory Data Analysis (EDA), it is common to return some statistics by running summary data using “group by” operations. The tricky and error-prone part is that by default, the Pandas “group by” ignores the NAs. Let’s see some examples.

import pandas as pd
import numpy as np

df = pd.DataFrame({'A':[1,1,2,2,3,3, np.nan, 3],
                   'B':['a','a','a','b',np.nan,'b','b',np.nan],
                   'C':[10,20,30,10,20,30,10,20]})

df

Let’s run some group by operations.

df.groupby(['A'])['C'].mean()
A
1.0    15.000000
2.0    20.000000
3.0    23.333333
Name: C, dtype: float64

As we can see, the NaN values were ignored. We can easily return the NaNs by adding the dropna=False within the group by.

df.groupby(['A'], dropna=False)['C'].mean()
A
1.0    15.000000
2.0    20.000000
3.0    23.333333
NaN    10.000000
Name: C, dtype: float64

As we can see, the NaN appeared in the output. Another approach could be to fill the NAs with a number and then to run the group by. Finally, another approach could be to set the data type of the grouping variable to string. For example:

Fill NAs Approach

# create a copy of the initial df
df1 = df.copy()

# fill the NA with the "Unknown" string
df1['A'] = df1['A'].fillna("Unknown")

df1.groupby(['A'])['C'].mean()
A
1.0        15.000000
2.0        20.000000
3.0        23.333333
Unknown    10.000000
Name: C, dtype: float64

Change Data Type Approach

# create a copy of the initial df
df2 = df.copy()

# set the grouping variable as string
df2['A'] = df2['A'].astype('str')

df2.groupby(['A'])['C'].mean()
A
1.0    15.000000
2.0    20.000000
3.0    23.333333
nan    10.000000
Name: C, dtype: float64

Share This Post

Share on facebook
Share on linkedin
Share on twitter
Share on email

Subscribe To Our Newsletter

Get updates and learn from the best

More To Explore

Python

Image Captioning with HuggingFace

Image captioning with AI is a fascinating application of artificial intelligence (AI) that involves generating textual descriptions for images automatically.

Python

Intro to Chatbots with HuggingFace

In this tutorial, we will show you how to use the Transformers library from HuggingFace to build chatbot pipelines. Let’s