Predictive Hacks

Pandas GroupBy Tips

How to Save & Read a Pandas Dataframe Containing Lists and Dictionaries

This post is a short tutorial in Pandas GroupBy. As always we will work with examples. Let’s create a dummy DataFrame for demonstration purposes.

df = pd.DataFrame({'Gender':['m','m','m','f','f','f','f', 'm','f','f'],
                   'Type':['a','b','c','a','b','c','c', 'c','c','b'],
                   'ColA':[10,20,30,40,50,60,70,80,90,100],
                   'ColB':[0,5,10,15,25,30,50,10,20,30]})
df
 
Pandas GroupBy Tips 1

Tip: How to return results without Index

In many cases, we do not want the column(s) of the group by operations to appear as indexes. For that reason, we use to add the reset_index() at the end. For example, let’s say that we want to get the average of ColA group by Gender

df.groupby('Gender')['ColA'].mean()
 

Output:

Pandas GroupBy Tips 2

Now, if we want to remove the Gender as index we add the reset_index() command as the end:

df.groupby('Gender')['ColA'].mean().reset_index()

Output:

Pandas GroupBy Tips 3

Tip: Instead of typing the rest_index() command, you can add the as_index=False in the groupby and you will get the same output. For example:

df.groupby('Gender', as_index=False)['ColA'].mean()
Pandas GroupBy Tips 4

This can be extended to more columns. For example, let’s say that we group by Gender and Type

With index:

df.groupby(['Gender', 'Type'])['ColA'].mean()

Output:

Pandas GroupBy Tips 5

Without index:

df.groupby(['Gender', 'Type'], as_index=False)['ColA'].mean()

Output:

Pandas GroupBy Tips 6

Tip: How to get the groups

Once we group our data frame, we can show and get them. For example, let’s assume that we group our DataFrame by Type

grouped = df.groupby('Type')
 

How to iterate over groups?

We can iterate over groups as follows:

for g in grouped:
    print(g)
 
Pandas GroupBy Tips 7

How to get a group?

We can get a specific group using the command get_group. For example, let’s say that we want to get the group of Type b“.

grouped.get_group('b')
 
Pandas GroupBy Tips 8

Tip: How to apply multiple functions

Let’s say that we want for ColA to calculate the mean and var and for ColB to calculate the min and max, group by Gender.

df.groupby('Gender').agg({'ColA':['mean', 'var'], 
                          'ColB':['min', 'max'] })
 
Pandas GroupBy Tips 9

Tip: How to change the names of the aggregated columns

We can change the names by passing a tuple as follows:

df.groupby('Gender').agg({'ColA':[('ColA_Mean','mean'), ('ColA_Var', 'var')], 
                          'ColB':[('ColB_Min','min'), ('ColB_Max', 'max')] })
 
Pandas GroupBy Tips 10

Tip: How to add a custom function

Let’s say that we want to add a custom calculation which is the range, i.e max-min for ColA.

df.groupby('Gender').agg({'ColA':[('ColA_Mean','mean'), ('ColA_Var', 'var'), ('CustomFunction', lambda x: x.max() - x.min())], 
                          'ColB':[('ColB_Min','min'), ('ColB_Max', 'max')] }) 
 
Pandas GroupBy Tips 11

Tip: Dealing with Multiple Indexes

Let’s create a grouped DataFrame with multiple indexes. You can find more details at Pandas Documentation

ex = df.groupby(['Gender', 'Type']).agg({'ColA':['mean'], 
                          'ColB':['min', 'max'] })

ex
 
Pandas GroupBy Tips 12

Let’s say now that I want to get the row where Gender==’f’ and Type==’c’. We can just use the .loc and pass the values as follows:

ex.loc[('f','c')]
 
Pandas GroupBy Tips 13

Let’s say that I want to run the same query but this time to get the data only for ColA:

ex.loc[('f','c'), 'ColA']
 
Pandas GroupBy Tips 14

Let’s say now that I want to get the results of ColB for all indexes:

ex.loc[:, 'ColB']
 
Pandas GroupBy Tips 15

Tip: Slicers with Multiple Indexes

Let’s say that I want to get all the levels from Gender and the levels ‘a’ and ‘b’ from Type. I can use the slicers as follows:

ex.loc[(slice(None), slice('a','b')), :]
 
Pandas GroupBy Tips 16

Notice: You can use slice(None) to select all the contents of that level. You do not need to specify all the deeper levels, they will be implied as slice(None).


Tip: Reset a column’s MultiIndex levels

As we see here in our example DataFrame called ‘ex‘, we have Multiple Indexes even in columns. Let’s see how we can reset them.

ex.columns = ex.columns.droplevel(0)
ex = ex.rename_axis(None, axis=1)
ex
 
Pandas GroupBy Tips 17

Finally, if we want to reset also the row indexes we can use the command reset_index()

ex.reset_index()
 
Pandas GroupBy Tips 18

Notice that we could reset only one level of the indexes as follows:

ex.reset_index(level=0)

Pandas GroupBy Tips 19

Share This Post

Share on facebook
Share on linkedin
Share on twitter
Share on email

Leave a Comment

Subscribe To Our Newsletter

Get updates and learn from the best

More To Explore

data science journey
Miscellaneous

My Journey as a Data Science Blogger

Μy Background My Studies Back in 2001, I entered university to study Statistics. During my first year, I ran my