The number \(e\approx 2.718\) can be expressed in many different ways. One way is as a limit and another is as an infinite series: \[\begin{align*} e &= \lim_{n\to\infty} \left(1+\frac{1}{n}\right)^n &\text{limit representation} \\ e &= \sum_{k=0}^\infty \frac{1}{k!} &\text{series representation} \end{align*}\]
# First initialize the vector of values
x <- 1:100
e_limit <- vector(length = length(x))
e_limit[1] <- 1
e_series <- vector(length = length(x))
e_series[1] <- 1
# Next, use a loop to fill in the values.
for(i in seq(2, length(x))) {
e_limit[i] <- (1 + 1/i)^i
e_series[i] <- e_series[i-1] + 1/factorial(i-1)
}
The above loop can be written in several ways. I started the loop index at 2 instead of 1 since I had already filled in the value at index 1. One must take care when dealing with loop indices at the boundary of the range.
A simpler, faster, and more elegant way is to use vectorization. Think about how this works.
x <- 0:100
e_limit <- (1 + 1/x)^x
e_series <- cumsum(1/factorial(x))
Regardless of the approach, here is the code to generate the plot.
plot(x, e_limit, type="n", main="Convergence to e", ylab="y")
points(x, e_limit, type="l", col = "blue")
points(x, e_series, type="l", col = "red")
legend("bottomright", legend=c("Limit", "Series"), col=c("blue", "red"), lty=1)
For this problem, we will work with the flights
data set
in the nycflights13
package. Install and load the
nycflights13
package. Also make sure you load
tidyverse
in case you need it.
library(nycflights13)
library(tidyverse)
Provide a brief description of the data set and a few of the
variables (use ?flights
). Is flights
a
tibble?
flights
is a tibble. This can be seen by printing
flights
to the console or by calling
is_tibble(flights)
.
Extract a tibble containing American Airlines (AA) flights to LAX
that departed before 1030. Return only the columns month
,
day
, dep_time
, dest
, and
carrier
. How many flights fit these criteria?
flights |>
filter(dest == "LAX", carrier == "AA", dep_time < 1030) |>
select(month, day, dep_time, dest, carrier)
## # A tibble: 977 x 5
## month day dep_time dest carrier
## <int> <int> <int> <chr> <chr>
## 1 1 1 743 LAX AA
## 2 1 1 856 LAX AA
## 3 1 1 1026 LAX AA
## 4 1 2 732 LAX AA
## 5 1 2 855 LAX AA
## 6 1 3 730 LAX AA
## 7 1 3 855 LAX AA
## 8 1 3 1024 LAX AA
## 9 1 4 728 LAX AA
## 10 1 4 858 LAX AA
## # ... with 967 more rows
We see 977 rows. Hence 977 flights fit these criteria
flights_christmas <- flights |>
filter(month == 12, day == 25)
sum(flights_christmas$distance)
## [1] 803747
We created a new tibble flights_christmas
and
then computed the sum of the distances directly. Alternatively, you
could use the summarize
function:
flights |>
filter(month == 12, day == 25) |>
summarize(sum(distance))
## # A tibble: 1 x 1
## `sum(distance)`
## <dbl>
## 1 803747
air_time
variable gives the duration of the flight
in minutes. Create a tibble containing flights on Christmas day and only
the variables month
, day
, origin
,
dest
, and air_time_hour
where the last
variable gives the duration of the flight in hours.flights |>
filter(month == 12, day == 25) |>
mutate(air_time_hour = air_time / 60) |>
select(month, day, origin, dest, air_time_hour)
## # A tibble: 719 x 5
## month day origin dest air_time_hour
## <int> <int> <chr> <chr> <dbl>
## 1 12 25 EWR CLT 1.63
## 2 12 25 EWR IAH 3.38
## 3 12 25 JFK MIA 2.43
## 4 12 25 JFK BQN 3.18
## 5 12 25 LGA ORD 2.05
## 6 12 25 LGA DTW 1.47
## 7 12 25 LGA ATL 1.97
## 8 12 25 LGA FLL 2.45
## 9 12 25 EWR FLL 2.48
## 10 12 25 JFK MCO 2.28
## # ... with 709 more rows
The order here is important; we must create the variable with
mutate
before select
ing it.
For this problem we will work with the mtcars
data set
in the datasets
library.
library(datasets)
Provide a brief description of this data set and some of the
variables. Is it a tibble?
Not a tibble, as can be seen by running mtcars
in
the console or using is_tibble
.
Create the following histogram of displacement along with
vertical lines indicating the mean and median displacement.
Hint: I set the breaks
to go from 0 to 500 in
increments of 25.
hist(mtcars$disp, breaks=seq(0, 500, 25),
main= "Hist of Displacement",
xlab = "Displacement (cu.in.)")
abline(v = c(mean(mtcars$disp), median(mtcars$disp)),
lty=c(2,3), lwd=2, col="red")
legend("topright", legend = c("mean disp", "median disp"),
lty=c(2,3), lwd=2, col="red")
boxplot(mtcars$qsec ~ mtcars$cyl,
xlab = "Number of Cylinders", ylab="1/4 mile time (sec)")
The outlier is the car with the largest value of qsec
. You
can find this using which
or which.max
:
# Find the largest qsec
max_qsec <- max(mtcars$qsec)
# Find the index of the largest
max_qsec_idx <- which(mtcars$qsec == max_qsec)
# Get the row corresponding to this index
mtcars[max_qsec_idx, ]
## mpg cyl disp hp drat wt qsec vs am gear carb
## Merc 230 22.8 4 140.8 95 3.92 3.15 22.9 1 0 4 2
The outlier is the Merc 230. The code below uses
which.max
:
mtcars[which.max(mtcars$qsec),]
## mpg cyl disp hp drat wt qsec vs am gear carb
## Merc 230 22.8 4 140.8 95 3.92 3.15 22.9 1 0 4 2
barplot(table(mtcars$cyl), ylab="Count", xlab="Number of cylinders")
Write a function search_insert_position(v, target)
which
takes a sorted numerical vector v
, a numerical target
target
. If target
is present in
v
, return the index of target
. Otherwise,
return the index in v
of target where it would be if it
were inserted in order.
The input v
is guaranteed to be sorted and to contain
unique values (i.e. no duplicates).
search_insert_position <- function(v, target) {
# target is present in v; return index of target
if (target %in% v) {
return(which(target == v))
}
# if target exceeds all values in v, locate the last index of v: length(v)
if (all(v < target)) {
return(length(v) + 1)
}
# Otherwise, find the first index i where v[i] > target
return(which(v > target)[1])
}
x <- c(1, 3, 5, 6)
search_insert_position(x, 5)
## [1] 3
search_insert_position(x, 2)
## [1] 2
search_insert_position(x, 7)
## [1] 5