Predictive Hacks

How to Simulate Data from Different Distributions

Let’s say that you want to simulate 10 observations from 3 normal distributions with different parameters mean and standard deviation. We can do that efficiently using the purrr package from tidyverse family. The 3 normal distributions are the following:

  • Distribution A: mean=30 and sd=1
  • Distribution B: mean=40 and sd=2
  • Distribution C: mean=50 and sd=3
library(tidyverse)

df<-tibble(Distribution=c("A","B","C"), Mean=c(30, 40, 50), StDev=c(1, 2, 3))
df

Let’s simulate the data using purrr and the map function:

my_data<-map2(df$Mean, df$StDev, ~data.frame(Sims=rnorm(mean=.x, sd=.y, n=10)))

# set the name for each list element
my_data<-set_names(my_data, df$Distribution)

my_data

We can get each element from the list by simply call it by index like my_data[[1]] or by name like my_data[["A"]]. If you have more than two arguments, let’s say mean, sd and size you can use the pmap function which takes multiple arguments.

Share This Post

Share on facebook
Share on linkedin
Share on twitter
Share on email

Subscribe To Our Newsletter

Get updates and learn from the best

More To Explore

Python

Image Captioning with HuggingFace

Image captioning with AI is a fascinating application of artificial intelligence (AI) that involves generating textual descriptions for images automatically.

Python

Intro to Chatbots with HuggingFace

In this tutorial, we will show you how to use the Transformers library from HuggingFace to build chatbot pipelines. Let’s