The number \(e\approx 2.718\) can be expressed in many different ways. One way is as a limit and another is as an infinite series: \[\begin{align*} e &= \lim_{n\to\infty} \left(1+\frac{1}{n}\right)^n &\text{limit representation} \\ e &= \sum_{k=0}^\infty \frac{1}{k!} &\text{series representation} \end{align*}\]
# First initialize the vector of values
x <- 1:100
e_limit <- vector(length = length(x))
e_limit[1] <- 1
e_series <- vector(length = length(x))
e_series[1] <- 1
# Next, use a loop to fill in the values.
for(i in seq(2, length(x))) {
e_limit[i] <- (1 + 1/i)^i
e_series[i] <- e_series[i-1] + 1/factorial(i-1)
The above loop can be written in several ways. I started the loop index at 2 instead of 1 since I had already filled in the value at index 1. One must take care when dealing with loop indices at the boundary of the range.
A simpler, faster, and more elegant way is to use vectorization. Think about how this works.
x <- 0:100
e_limit <- (1 + 1/x)^x
e_series <- cumsum(1/factorial(x))
Regardless of the approach, here is the code to generate the plot.
plot(x, e_limit, type="n", main="Convergence to e", ylab="y")
points(x, e_limit, type="l", col = "blue")
points(x, e_series, type="l", col = "red")
legend("bottomright", legend=c("Limit", "Series"), col=c("blue", "red"), lty=1)
For this problem, we will work with the flights
data set
in the nycflights13
package. Install and load the
package. Also make sure you load
in case you need it.
Provide a brief description of the data set and a few of the
variables (use ?flights
). Is flights
is a tibble. This can be seen by printing
to the console or by calling
Extract a tibble containing American Airlines (AA) flights to LAX
that departed before 1030. Return only the columns month
, dep_time
, dest
, and
. How many flights fit these criteria?
flights |>
filter(dest == "LAX", carrier == "AA", dep_time < 1030) |>
select(month, day, dep_time, dest, carrier)
## # A tibble: 977 x 5
## month day dep_time dest carrier
## <int> <int> <int> <chr> <chr>
## 1 1 1 743 LAX AA
## 2 1 1 856 LAX AA
## 3 1 1 1026 LAX AA
## 4 1 2 732 LAX AA
## 5 1 2 855 LAX AA
## 6 1 3 730 LAX AA
## 7 1 3 855 LAX AA
## 8 1 3 1024 LAX AA
## 9 1 4 728 LAX AA
## 10 1 4 858 LAX AA
## # ... with 967 more rows
We see 977 rows. Hence 977 flights fit these criteria
flights_christmas <- flights |>
filter(month == 12, day == 25)
## [1] 803747
We created a new tibble flights_christmas
then computed the sum of the distances directly. Alternatively, you
could use the summarize
flights |>
filter(month == 12, day == 25) |>
## # A tibble: 1 x 1
## `sum(distance)`
## <dbl>
## 1 803747
variable gives the duration of the flight
in minutes. Create a tibble containing flights on Christmas day and only
the variables month
, day
, origin
, and air_time_hour
where the last
variable gives the duration of the flight in |>
filter(month == 12, day == 25) |>
mutate(air_time_hour = air_time / 60) |>
select(month, day, origin, dest, air_time_hour)
## # A tibble: 719 x 5
## month day origin dest air_time_hour
## <int> <int> <chr> <chr> <dbl>
## 1 12 25 EWR CLT 1.63
## 2 12 25 EWR IAH 3.38
## 3 12 25 JFK MIA 2.43
## 4 12 25 JFK BQN 3.18
## 5 12 25 LGA ORD 2.05
## 6 12 25 LGA DTW 1.47
## 7 12 25 LGA ATL 1.97
## 8 12 25 LGA FLL 2.45
## 9 12 25 EWR FLL 2.48
## 10 12 25 JFK MCO 2.28
## # ... with 709 more rows
The order here is important; we must create the variable with
before select
ing it.
For this problem we will work with the mtcars
data set
in the datasets
Provide a brief description of this data set and some of the
variables. Is it a tibble?
Not a tibble, as can be seen by running mtcars
the console or using is_tibble
Create the following histogram of displacement along with
vertical lines indicating the mean and median displacement.
Hint: I set the breaks
to go from 0 to 500 in
increments of 25.
hist(mtcars$disp, breaks=seq(0, 500, 25),
main= "Hist of Displacement",
xlab = "Displacement (")
abline(v = c(mean(mtcars$disp), median(mtcars$disp)),
lty=c(2,3), lwd=2, col="red")
legend("topright", legend = c("mean disp", "median disp"),
lty=c(2,3), lwd=2, col="red")
boxplot(mtcars$qsec ~ mtcars$cyl,
xlab = "Number of Cylinders", ylab="1/4 mile time (sec)")
The outlier is the car with the largest value of qsec
. You
can find this using which
or which.max
# Find the largest qsec
max_qsec <- max(mtcars$qsec)
# Find the index of the largest
max_qsec_idx <- which(mtcars$qsec == max_qsec)
# Get the row corresponding to this index
mtcars[max_qsec_idx, ]
## mpg cyl disp hp drat wt qsec vs am gear carb
## Merc 230 22.8 4 140.8 95 3.92 3.15 22.9 1 0 4 2
The outlier is the Merc 230. The code below uses
## mpg cyl disp hp drat wt qsec vs am gear carb
## Merc 230 22.8 4 140.8 95 3.92 3.15 22.9 1 0 4 2
barplot(table(mtcars$cyl), ylab="Count", xlab="Number of cylinders")
Write a function search_insert_position(v, target)
takes a sorted numerical vector v
, a numerical target
. If target
is present in
, return the index of target
. Otherwise,
return the index in v
of target where it would be if it
were inserted in order.
The input v
is guaranteed to be sorted and to contain
unique values (i.e. no duplicates).
search_insert_position <- function(v, target) {
# target is present in v; return index of target
if (target %in% v) {
return(which(target == v))
# if target exceeds all values in v, locate the last index of v: length(v)
if (all(v < target)) {
return(length(v) + 1)
# Otherwise, find the first index i where v[i] > target
return(which(v > target)[1])
x <- c(1, 3, 5, 6)
search_insert_position(x, 5)
## [1] 3
search_insert_position(x, 2)
## [1] 2
search_insert_position(x, 7)
## [1] 5