There is an easy method to get the groups from a groupby operation.
import pandas as pd df=pd.DataFrame({'A':[1,1,2,2,3],'B':['a','b','a','c','b'],'C':['a','b','c','d','e']}) df
A B C
0 1 a a
1 1 b b
2 2 a c
3 2 c d
4 3 b e
Let’s do a groupby and save it to a variable.
group=df.groupby(by=['A'])
At this point we can see the groups running the following:
group.groups
{1: [0, 1], 2: [2, 3], 3: [4]}
The keys of this dictionary are the unique values of Column A which we applied the group by operation. The values are the indexes of the rows where every group has. If we use the indexes, we will get the corresponding group.
#lets get the group 1 df.iloc[groups[1]]
A B C
0 1 a a
1 1 b b
However, Pandas has also its own function to get the groups.
group.get_group(1)
A B C
0 1 a a
1 1 b b
What if we have more than one group variable? It’s the same as before but we have to use all variables inside a tuple. Let’s see the groups to understand this concept.
group=df.groupby(by=['A','B']) group.groups
{(1, 'a'): [0], (1, 'b'): [1], (2, 'a'): [2], (2, 'c'): [3], (3, 'b'): [4]}
For example, if we want the group has the value 1 for A and the value “a” for B we should run the following:
group.get_group((1, 'a'))
A B C
0 1 a a