Predictive Hacks

How to Simulate Data from Different Distributions

Let’s say that you want to simulate 10 observations from 3 normal distributions with different parameters mean and standard deviation. We can do that efficiently using the purrr package from tidyverse family. The 3 normal distributions are the following:

  • Distribution A: mean=30 and sd=1
  • Distribution B: mean=40 and sd=2
  • Distribution C: mean=50 and sd=3

df<-tibble(Distribution=c("A","B","C"), Mean=c(30, 40, 50), StDev=c(1, 2, 3))

Let’s simulate the data using purrr and the map function:

my_data<-map2(df$Mean, df$StDev, ~data.frame(Sims=rnorm(mean=.x, sd=.y, n=10)))

# set the name for each list element
my_data<-set_names(my_data, df$Distribution)


We can get each element from the list by simply call it by index like my_data[[1]] or by name like my_data[["A"]]. If you have more than two arguments, let’s say mean, sd and size you can use the pmap function which takes multiple arguments.

