Predictive Hacks

How to Predict Runners’ Place in a Race

predict race

There are many different approaches to predict the winner of a race. The race can be any distance and the runners can be dogs, horses and humans. Also, apart from trying to predict the winner, it may be possible to answer other questions like the probability of a runner being on the podium (top three positions) and so on.

Personally, in this kind of problems, I prefer to approach them with Monte Carlo simulation instead of trying to build Machine Learning models. Let ‘s describe the Monte Carlo approach.

The Data

Let’s say that we want to predict the probability of each runner winning a race of 100 meters. For our model we want to get the past racing times of the runners during the last period of time, let’s say last 1 to 2 years provided that we have a sufficient number of races. Then we need to calculate the mean and the standard deviation of each runner. Notice that it makes sense to use an exponential moving average for the mean and maybe for standard deviation so that to give more weight to the most recent observations. Also, a good technique is to remove the worst time of each racer.

You can easily get the exponential moving average with pandas. Let’s show how we can do that. Assume that our data frame has the NAME of the runner and the TIME order by DATE. Our logic is to get the rolling EWM and then to keep the last for each runner

import pandas as pd
import numpy as np

# convert it to data
df['DATE'] = pd.to_datetime(df.DATE)


# sort by date
df.sort_values('DATE', inplace=True)
df['mean_tmp']=df.groupby('NAME')['TIME'].transform(lambda x: x.ewm(alpha=0.30).mean())
df['std_tmp']=df.groupby('NAME')['TIME'].transform(lambda x: x.ewm(alpha=0.30).std())

# remove the NAN in Std
df.dropna(subset=['std_tmp'], inplace=True)

# get the most recent observation of the EWM
runners= df.groupby('NAME')[['mean_tmp', 'std_tmp']].last()
runners.reset_index(inplace=True)
runners.columns = ['NAME', 'mean', 'std']
runners

Assume that we come up with the following mean and standard deviation for the 8 runners.

runner = pd.DataFrame({'NAME':["A","B","C","D","E","F","G","H"],
                       'mean': [13.11, 13.17, 12.99, 12.96, 13.25, 13.00, 13.40, 13.29],
                       'std': [0.15, 0.15, 0.17, 0.20, 0.14, 0.16, 0.17, 0.2]})

Make the Predictions

Find the Probability of each Runner to Win

Let’s get the probability of each runner to win by running a Monte Carlo Simulation by approximating the normal distribution with the corresponding parameters.

# number of simulations
np.random.seed(5)

# number of simulations
sims = 1000

runner['monte_carlo'] = runner.apply(lambda x:np.random.normal(x['mean'], x['std'], sims), axis=1)

Once we simulated the data, we can get the probability of each runner to win.

# Probability to finish in top x positions
top_x = 1
tmp_probs = pd.DataFrame((pd.DataFrame(list(runner['monte_carlo']),index=runner.NAME).rank()<=top_x).sum(axis=1)/sims)
tmp_probs.reset_index(inplace=True)
tmp_probs.columns=['NAME', 'Probability']

As we can see, the runner D was 34.8% probability to win and he is the favorite!

Find the Probability of each Runner to Win a Medal

Similarly, we can estimate the probability of each runner to be on the podium, i.e. in the top 3 positions.

# Probability to finish in top x positions 
top_x = 3 # in top three positions
tmp_probs = pd.DataFrame((pd.DataFrame(list(runner['monte_carlo']),index=runner.NAME).rank()<=top_x).sum(axis=1)/sims)
tmp_probs.reset_index(inplace=True)
tmp_probs.columns=['NAME', 'Probability']

Share This Post

Share on facebook
Share on linkedin
Share on twitter
Share on email

3 thoughts on “How to Predict Runners’ Place in a Race”

  1. Thank you for this great article! Made running simulations very easy for me.

    What code would you use to find out the probability of one runner finishing ahead of another runner?

    I look forward to your response. Thank you!

    Reply
  2. Hi

    New to python struggling over a week to install python windows 7 32 bit (struggling)…

    Any chance you have a video to watch exactly each step of this turorial … i would greatly appreciate in you so.

    Thanks and Merry Christmas

    Dion
    Durban
    South Africa

    Reply

Leave a Comment

Subscribe To Our Newsletter

Get updates and learn from the best

More To Explore

Python

Image Captioning with HuggingFace

Image captioning with AI is a fascinating application of artificial intelligence (AI) that involves generating textual descriptions for images automatically.

Python

Intro to Chatbots with HuggingFace

In this tutorial, we will show you how to use the Transformers library from HuggingFace to build chatbot pipelines. Let’s