Predictive Hacks

Pandas GroupBy Tips

How to Save & Read a Pandas Dataframe Containing Lists and Dictionaries

This post is a short tutorial in Pandas GroupBy. As always we will work with examples. Let’s create a dummy DataFrame for demonstration purposes.

df = pd.DataFrame({'Gender':['m','m','m','f','f','f','f', 'm','f','f'],
                   'Type':['a','b','c','a','b','c','c', 'c','c','b'],
                   'ColA':[10,20,30,40,50,60,70,80,90,100],
                   'ColB':[0,5,10,15,25,30,50,10,20,30]})
df
 

Tip: How to return results without Index

In many cases, we do not want the column(s) of the group by operations to appear as indexes. For that reason, we use to add the reset_index() at the end. For example, let’s say that we want to get the average of ColA group by Gender

df.groupby('Gender')['ColA'].mean()
 

Output:

Now, if we want to remove the Gender as index we add the reset_index() command as the end:

df.groupby('Gender')['ColA'].mean().reset_index()

Output:

Tip: Instead of typing the rest_index() command, you can add the as_index=False in the groupby and you will get the same output. For example:

df.groupby('Gender', as_index=False)['ColA'].mean()

This can be extended to more columns. For example, let’s say that we group by Gender and Type

With index:

df.groupby(['Gender', 'Type'])['ColA'].mean()

Output:

Without index:

df.groupby(['Gender', 'Type'], as_index=False)['ColA'].mean()

Output:


Tip: How to get the groups

Once we group our data frame, we can show and get them. For example, let’s assume that we group our DataFrame by Type

grouped = df.groupby('Type')
 

How to iterate over groups?

We can iterate over groups as follows:

for g in grouped:
    print(g)
 

How to get a group?

We can get a specific group using the command get_group. For example, let’s say that we want to get the group of Type b“.

grouped.get_group('b')
 

Tip: How to apply multiple functions

Let’s say that we want for ColA to calculate the mean and var and for ColB to calculate the min and max, group by Gender.

df.groupby('Gender').agg({'ColA':['mean', 'var'], 
                          'ColB':['min', 'max'] })
 

Tip: How to change the names of the aggregated columns

We can change the names by passing a tuple as follows:

df.groupby('Gender').agg({'ColA':[('ColA_Mean','mean'), ('ColA_Var', 'var')], 
                          'ColB':[('ColB_Min','min'), ('ColB_Max', 'max')] })
 

Tip: How to add a custom function

Let’s say that we want to add a custom calculation which is the range, i.e max-min for ColA.

df.groupby('Gender').agg({'ColA':[('ColA_Mean','mean'), ('ColA_Var', 'var'), ('CustomFunction', lambda x: x.max() - x.min())], 
                          'ColB':[('ColB_Min','min'), ('ColB_Max', 'max')] }) 
 

Tip: Dealing with Multiple Indexes

Let’s create a grouped DataFrame with multiple indexes. You can find more details at Pandas Documentation

ex = df.groupby(['Gender', 'Type']).agg({'ColA':['mean'], 
                          'ColB':['min', 'max'] })

ex
 

Let’s say now that I want to get the row where Gender==’f’ and Type==’c’. We can just use the .loc and pass the values as follows:

ex.loc[('f','c')]
 

Let’s say that I want to run the same query but this time to get the data only for ColA:

ex.loc[('f','c'), 'ColA']
 

Let’s say now that I want to get the results of ColB for all indexes:

ex.loc[:, 'ColB']
 

Tip: Slicers with Multiple Indexes

Let’s say that I want to get all the levels from Gender and the levels ‘a’ and ‘b’ from Type. I can use the slicers as follows:

ex.loc[(slice(None), slice('a','b')), :]
 

Notice: You can use slice(None) to select all the contents of that level. You do not need to specify all the deeper levels, they will be implied as slice(None).


Tip: Reset a column’s MultiIndex levels

As we see here in our example DataFrame called ‘ex‘, we have Multiple Indexes even in columns. Let’s see how we can reset them.

ex.columns = ex.columns.droplevel(0)
ex = ex.rename_axis(None, axis=1)
ex
 

Finally, if we want to reset also the row indexes we can use the command reset_index()

ex.reset_index()
 

Notice that we could reset only one level of the indexes as follows:

ex.reset_index(level=0)

Share This Post

Share on facebook
Share on linkedin
Share on twitter
Share on email

Leave a Comment

Subscribe To Our Newsletter

Get updates and learn from the best

More To Explore

Python

Image Captioning with HuggingFace

Image captioning with AI is a fascinating application of artificial intelligence (AI) that involves generating textual descriptions for images automatically.

Python

Intro to Chatbots with HuggingFace

In this tutorial, we will show you how to use the Transformers library from HuggingFace to build chatbot pipelines. Let’s