This short post is about finding an efficient way to count consecutive events. We are going to represent a straightforward practical example:
import pandas as pd df = pd.DataFrame({'Score':['win', 'loss', 'loss', 'loss', 'win', 'win', 'win', 'win', 'win', 'loss', 'win', 'loss', 'loss']}) df
Score
0 win
1 loss
2 loss
3 loss
4 win
5 win
6 win
7 win
8 win
9 loss
10 win
11 loss
12 loss
And with the following “hack” we are going to get the column of the “Streak”
df['Streak'] = df['Score'].groupby((df['Score'] != df['Score'].shift()).cumsum()).cumcount() + 1 df
Score Streak
0 win 1
1 loss 1
2 loss 2
3 loss 3
4 win 1
5 win 2
6 win 3
7 win 4
8 win 5
9 loss 1
10 win 1
11 loss 1
12 loss 2
Count the Consecutive Events within Group
Let’s say that we have the same example as above, but we want to do the same exercise within group.
df = pd.DataFrame({'Group':['A','A', 'A','A','A','A','B','B','B','B','B','B','B'], 'Score':['win', 'loss', 'loss', 'loss', 'win', 'win', 'win', 'win', 'win', 'loss', 'win', 'loss', 'loss']}) df
Group Score
0 A win
1 A loss
2 A loss
3 A loss
4 A win
5 A win
6 B win
7 B win
8 B win
9 B loss
10 B win
11 B loss
12 B loss
df['Streak'] = df['Score'].groupby((df['Score'] != df.groupby(['Group'])['Score'].shift()).cumsum()).cumcount() + 1 df
Group Score Streak
0 A win 1
1 A loss 1
2 A loss 2
3 A loss 3
4 A win 1
5 A win 2
6 B win 1
7 B win 2
8 B win 3
9 B loss 1
10 B win 1
11 B loss 1
12 B loss 2