For Data Science positions that require some knowledge of Statistics and Programming skills, is common to ask questions like those below.
Question 1
Suppose an urn contains 40 red
, 25 green
and 35 blue
balls. Balls are drawn from the urn one-by-one, at random and without replacement. Let \(N\) denote the draw at which the first blue ball
appears, and \(S\) denote the number of green balls
drawn until the \(N_{th}\) draw (i.e. until the first bue ball
appears). Estimate \(E[N|S=2]\) by generating \(10000~iid\) copies of \((S,N)\)
Solution 1
urn<-c(rep("red",40), rep("green",25), rep("blue",35)) v<-{} for (i in 1:10000) { s<-sample(urn,100, replace = FALSE) blue_ball<-min(which(s=="blue")) green_balls<-min(which(s[1:blue_ball]=='green')) green_balls[!is.finite(green_balls)] <- 0 if (green_balls==2) { v<-c(v,blue_ball) } } mean(v)
[1] 4.792257
Question 2
Suppose that claims are made to an insurance company according to a Poisson process with rate 10 per day
. The amount of a claim is a random variable that has an exponential distribution with mean \(\$1000\). The insurance company receives payments continuously in time at a constant rate of \(\$11000\) per day. Starting with an initial capital of \(\$25000\), use \(10000\) simulations to estimate the probability that the firm’s capital is always positive throughout its first \(365\) days.
Solution 2
output<-{} for(i in 1:10000) { initial_capital<-25000 sums<-initial_capital for (d in 1:365) { P<-rpois(1,10) C<-rexp(1,1/1000) R<-11000 sums<-sums+R-C*P } output<-c(output,sums) } mean(output>0)
[1] 0.9644