Problem 1: Basic vector manipulation

gasbill <- c(46, 33, 39, 37, 46, 30, 48, 32, 49, 35, 30, 48)
gasbill[12] <- 49
gasbill
##  [1] 46 33 39 37 46 30 48 32 49 35 30 49
c(-50:-54, -53:-50)
## [1] -50 -51 -52 -53 -54 -53 -52 -51 -50
x <- seq(1, 10, by = 0.05)
length(x)
## [1] 181
x <- seq(1, 10, length = 100)
length(x)
## [1] 100
  1. You should get an error that says “too many arguments”. You are constrained to specify at most one of by and length, which makes sense.

Problem 2

ws2_df <- read.csv("ws2.csv")
summary(ws2_df)
##        x               y         
##  Min.   : 2.00   Min.   :  1.00  
##  1st Qu.:25.75   1st Qu.: 26.00  
##  Median :49.50   Median : 53.50  
##  Mean   :49.11   Mean   : 52.93  
##  3rd Qu.:70.00   3rd Qu.: 78.00  
##  Max.   :99.00   Max.   :100.00
  1. Determine the lengths of and .
c(length(ws2_df$x), length(ws2_df$y))
## [1] 100 100

The lengths of variables in a data frame must be the same.

  1. What is the 40th element of and the 80th element of ?
c(ws2_df$x[40], ws2_df$y[80])
## [1] 30 42
  1. What is the average of all the values in the data frame, including both and ?
mean(c(ws2_df$x, ws2_df$y))
## [1] 51.02
  1. How many elements of are greater than 70?
sum(ws2_df$x > 70)
## [1] 24
  1. How many elements of are greater than or equal to the corresponding element in ?
sum(ws2_df$x >= ws2_df$y)
## [1] 46
  1. What is the proportion of elements of that are greater than or equal to the corresponding element in ?
mean(ws2_df$x >= ws2_df$y)
## [1] 0.46

Why does the above give the proportion? Because the average of the sum of the components of a binary vector is the proportion of 1s. We will use this trick later when we discuss simulation.

  1. How many values in differ from their corresponding value in by more than 10 in absolute value?
sum(abs(ws2_df$x - ws2_df$y) > 10)
## [1] 82

You should work out why the above gives the right answer. Vectorization is very handy for these kinds of operations.

Problem 3

Create a vector of integers from 1 to 12 inclusive.

x <- 1:12
  1. Use the vector to create a 3x4 matrix. Did recycling occur?
mx <- matrix(x, nrow = 3, ncol = 4)
mx
##      [,1] [,2] [,3] [,4]
## [1,]    1    4    7   10
## [2,]    2    5    8   11
## [3,]    3    6    9   12

Recycling did not occur; all elements of x were used exactly once.

  1. Use the vector to create a 4x4 matrix. Did recycling occur?
mx <- matrix(x, nrow = 4, ncol = 4)
## Warning in matrix(x, nrow = 4, ncol = 4): data length differs from size of
## matrix: [12 != 4 x 4]
mx
##      [,1] [,2] [,3] [,4]
## [1,]    1    5    9    1
## [2,]    2    6   10    2
## [3,]    3    7   11    3
## [4,]    4    8   12    4

Recycling occurred; the last column was filled with the first column. We got an error message since the size of the matrix is not a multiple of 12.

Problem 4

heights_df <- read.csv("heights.csv")
  1. Write a vectorized function cm_to_inch that takes a numeric centimeter and converts it to inches. Apply the function to the height vector.

1cm is approximately 0.39in.

cm_to_inch <- function(cm) {
  cm * 0.39 # Remember the final expression in a function is automatically returned
}
head(cm_to_inch(heights_df$height), 10)
##  [1] 62.40 67.08 65.52 71.37 68.25 73.71 60.84 65.13 76.05 64.35
  1. Write a vectorized function cm_to_ft_inch that converts numerical values given in cm to a feet inch format, rounding to the nearest inch.
cm_to_ft_inch <- function(cm) {
  inch <- cm_to_inch(cm)
  ft <- inch %/% 12
  inch <- round(inch %% 12)
  paste(ft, inch)
}
head(cm_to_ft_inch(heights_df$height), 10)
##  [1] "5 2"  "5 7"  "5 6"  "5 11" "5 8"  "6 2"  "5 1"  "5 5"  "6 4"  "5 4"