Predictive Hacks

sum case when by group in Pandas

In SQL it is common to write statements using the following syntax:

select mygroup, 
sum(case when category='A' then Value else 0 end) as CatgoryA,
sum(case when category='B' then Value else 0 end) as CatgoryB
from mytable
group by mygroup

The above case is not so common in Pandas since we can get the same output by using pivot tables. But let’s see how we can write the above statement in Python using Pandas.

import numpy as np
import pandas as pd

np.random.seed(5)

Category = np.random.choice(['A','B'], 100, p=[0.7,0.3])
Group = np.random.choice(['G1','G2', 'G3'], 100)
Value = np.random.randint(low=50, high=100, size=100)

df = pd.DataFrame({'Group':Group, 'Category':Category, 'Value':Value})

df
 
sum case when by group in Pandas 1

We have generated a data frame and we are ready to move on.

df.groupby('Group', as_index=False).apply(lambda x: pd.Series({'CatA':x.loc[x.Category=='A']['Value'].sum(),
                                               'CatB':x.loc[x.Category=='B']['Value'].sum()}))
sum case when by group in Pandas 2

As we can see we get the same results with the pivot table.

df.pivot_table(values='Value', index='Group', columns='Category', aggfunc=np.sum)
sum case when by group in Pandas 3

Share This Post

Share on facebook
Share on linkedin
Share on twitter
Share on email

Subscribe To Our Newsletter

Get updates and learn from the best

More To Explore

Photo by NordWood Themes on Unsplash
Miscellaneous

How To Manage Multiple Screen Sessions

Linux’s Screen lets you run terminal applications to a Server in the background even if you disconnect from the ssh connection.

python exception
Python

Exceptions in Python

In this tutorial, we will provide you with an example of exception handling in Python. For simplicity, we will work