When we are dealing with Financial Assets, Sports Analytics, Gambling Games etc, usually there is a need to keep track of the consecutive events, called streaks. For instance, for how many consecutive days the Stock X has closed with a positive sign, for how many games in a row the Team A has scored at least one goal and so on.
Example: Consecutive Events in a Roulette
Assume that there is a Roulette Wheel which returns Red (50%) and Black (50%). We are going to simulate N=1,000,000 Rolls and keep track of the streaks of Red and Black respectively. The R function which makes our life easier is the rle but if we want to track the running streak, then we need also to use the seq function. We add also another column, called EndOfStreak which indicates if the Streak has ended or not.
library(tidyverse) # number of simulations n<-1000000 # set a random seed for reproducibility set.seed(5) # create the data frame df<-tibble(Rolls=seq(1:n), Outcome=sample(c("Red", "Black"),n,replace = TRUE, prob = c(0.5,0.5)))%>% mutate(Streak=sequence(rle(Outcome)$lengths), EndOfStreak=ifelse(lead(Outcome)==Outcome, "No", "Yes")) df%>%print(n=20)
# A tibble: 1,000,000 x 4
Rolls Outcome Streak EndOfStreak
<int> <chr> <int> <chr>
1 1 Black 1 Yes
2 2 Red 1 No
3 3 Red 2 Yes
4 4 Black 1 No
5 5 Black 2 Yes
6 6 Red 1 No
7 7 Red 2 No
8 8 Red 3 No
9 9 Red 4 Yes
10 10 Black 1 No
11 11 Black 2 No
12 12 Black 3 No
13 13 Black 4 Yes
14 14 Red 1 Yes
15 15 Black 1 No
16 16 Black 2 No
17 17 Black 3 Yes
18 18 Red 1 No
19 19 Red 2 No
20 20 Red 3 No
It would be nice to see the distribution of the completed streaks. We expect to see that the streak=1 should be around 50%, the streak=2 should be around 0.25% (50% x 50%), the streak=3 should be around 12.5% (50% x 50% x 50%) and so on.
streaks<-df%>%filter(EndOfStreak=="Yes")%>%group_by(Streak)%>% summarise(Times=n())%>%ungroup()%>%mutate(Probability=Times/sum(Times)) streaks
# A tibble: 19 x 3
Streak Times Probability
<int> <int> <dbl>
1 1 249769 0.500
2 2 125411 0.251
3 3 62552 0.125
4 4 31086 0.0622
5 5 15463 0.0309
6 6 7786 0.0156
7 7 4001 0.00800
8 8 1977 0.00395
9 9 1004 0.00201
10 10 486 0.000972
11 11 254 0.000508
12 12 120 0.000240
13 13 57 0.000114
14 14 15 0.0000300
15 15 18 0.0000360
16 16 6 0.0000120
17 17 4 0.00000800
18 18 1 0.00000200
19 19 1 0.00000200