Predictive Hacks

Count the Consecutive Events in R

When we are dealing with Financial Assets, Sports Analytics, Gambling Games etc, usually there is a need to keep track of the consecutive events, called streaks. For instance, for how many consecutive days the Stock X has closed with a positive sign, for how many games in a row the Team A has scored at least one goal and so on.


Example: Consecutive Events in a Roulette

Assume that there is a Roulette Wheel which returns Red (50%) and Black (50%). We are going to simulate N=1,000,000 Rolls and keep track of the streaks of Red and Black respectively. The R function which makes our life easier is the rle but if we want to track the running streak, then we need also to use the seq function. We add also another column, called EndOfStreak which indicates if the Streak has ended or not.

library(tidyverse)

# number of simulations
n<-1000000

# set a random seed for reproducibility
set.seed(5)

# create the data frame
df<-tibble(Rolls=seq(1:n), Outcome=sample(c("Red", "Black"),n,replace = TRUE, prob = c(0.5,0.5)))%>%
  mutate(Streak=sequence(rle(Outcome)$lengths), EndOfStreak=ifelse(lead(Outcome)==Outcome, "No", "Yes"))

df%>%print(n=20)
# A tibble: 1,000,000 x 4
   Rolls Outcome Streak EndOfStreak
   <int> <chr>    <int> <chr>      
 1     1 Black        1 Yes        
 2     2 Red          1 No         
 3     3 Red          2 Yes        
 4     4 Black        1 No         
 5     5 Black        2 Yes        
 6     6 Red          1 No         
 7     7 Red          2 No         
 8     8 Red          3 No         
 9     9 Red          4 Yes        
10    10 Black        1 No         
11    11 Black        2 No         
12    12 Black        3 No         
13    13 Black        4 Yes        
14    14 Red          1 Yes        
15    15 Black        1 No         
16    16 Black        2 No         
17    17 Black        3 Yes        
18    18 Red          1 No         
19    19 Red          2 No         
20    20 Red          3 No  

It would be nice to see the distribution of the completed streaks. We expect to see that the streak=1 should be around 50%, the streak=2 should be around 0.25% (50% x 50%), the streak=3 should be around 12.5% (50% x 50% x 50%) and so on.

streaks<-df%>%filter(EndOfStreak=="Yes")%>%group_by(Streak)%>%
  summarise(Times=n())%>%ungroup()%>%mutate(Probability=Times/sum(Times))

streaks
# A tibble: 19 x 3
   Streak  Times Probability
    <int>  <int>       <dbl>
 1      1 249769  0.500     
 2      2 125411  0.251     
 3      3  62552  0.125     
 4      4  31086  0.0622    
 5      5  15463  0.0309    
 6      6   7786  0.0156    
 7      7   4001  0.00800   
 8      8   1977  0.00395   
 9      9   1004  0.00201   
10     10    486  0.000972  
11     11    254  0.000508  
12     12    120  0.000240  
13     13     57  0.000114  
14     14     15  0.0000300 
15     15     18  0.0000360 
16     16      6  0.0000120 
17     17      4  0.00000800
18     18      1  0.00000200
19     19      1  0.00000200

Share This Post

Share on facebook
Share on linkedin
Share on twitter
Share on email

Leave a Comment

Subscribe To Our Newsletter

Get updates and learn from the best

More To Explore

data science journey
Miscellaneous

My Journey as a Data Science Blogger

Μy Background My Studies Back in 2001, I entered university to study Statistics. During my first year, I ran my