set.seed(100)
A group of 30 dice is thrown. What is the probability that at least 3 of each of the values 1, 2, 3, 4, 5, 6 appear? Approximate the probability by simulating \(10^4\) replications.
The method below counts each value directly
five_each_1 <- function() {
s <- sample(1:6, 30, replace = T)
for (i in 1:6) {
if (sum(s == i) < 3) {
return(F)
}
}
return(T)
}
mean(replicate(10000, five_each_1()))
## [1] 0.4702
Actually there is an extremely easy way using tabulate
,
which gives us the counts directly.
s <- sample(1:6, 30, replace = T)
tabulate(s)
## [1] 5 4 3 5 9 4
five_each_2 <- function() {
all(tabulate(sample(1:6, 30, replace = T)) >= 3)
}
mean(replicate(10000, five_each_2()))
## [1] 0.4717
Determine the following probabilities using base R function (i.e. not through approximation).
dbinom(8, size = 12, prob = 0.71)
## [1] 0.226081
1 - pbinom(2, size = 9, prob = 0.08)
## [1] 0.02979319
In a certain population, women’s heights are normally distributed with a mean of 63.6 inches and standard deviation of 2.5 inches.
Let \(X \sim N(63.6, 2.5)\).
pnorm(-60, mean = 63.6, sd = 2.5) + (1 - pnorm(65, mean = 63.6, sd = 2.5))
## [1] 0.2877397
1 - pnorm(72, mean = 63.6, sd = 2.5)
## [1] 0.0003897124
s <- rnorm(500, mean = 63.6, sd = 2.5)
qqnorm(s)
qqline(s)
Suppose we have a biased coin with probability \(p\) of landing heads. Perform the following experiment:
Let \(X\) be the number of tails observed after stopping the experiment. In other words \(X\) is the number of failures before the first success.
This distribution is called the geometric distribution and we write \(X \sim \mathrm{Geom}(p)\).
dgeom
, pgeom
, and rgeom
are
the base R functions corresponding to this distribution.
Suppose \(X\sim \mathrm{Geom}(1/8)\), so that \(p = 1/8\).
What is the support of \(X\)? Is \(X\) a discrete or continuous r.v.? Discrete with support \(\{0, 1, 2, \dotsc\}\)
Using the base R functions, determine \(\mathbb{P}(X = 4)\).
dgeom(4, prob = 1/8)
## [1] 0.07327271
pgeom(4, prob = 1/8) - pgeom(1, prob = 1/8)
## [1] 0.2527161
set.seed(123)
p <- 1/8
(approx <- mean(rgeom(1000, prob = p)))
## [1] 7.202
abs(approx - (1-p)/p)
## [1] 0.202
Our approximation was 0.202 off from the theoretical value.
What is the probability that “A” and “B” are next to each other in line? Estimate using \(1,000\) replications.
set.seed(100)
sim_line <- function() {
line <- sample(LETTERS[1:10])
pos_a <- which(line == "A")
pos_b <- which(line == "B")
abs(pos_a - pos_b) == 1
}
r <- replicate(10^3, sim_line())
running <- cumsum(r) / seq_along(r)
plot(seq_along(r), running, type = "l", main = "Simulating Lining Up",
xlab = "Replication", ylab = "Probability")
p <- 1/5
abline(p, 0, col="red")