Write the function contains_duplicate(v) that takes a
numeric vector v and returns TRUE if any value appears
at least twice in the vector and FALSE otherwise.
There are several ways to do this. The extremely
easy way is it use the duplicated function.
contains_duplicate <- function(v) {
any(duplicated(v))
}
contains_duplicate(c(1, 2, 3, 1))
## [1] TRUE
contains_duplicate(c(1, 2, 3, 4))
## [1] FALSE
contains_duplicate(c(1, 1, 1, 3, 3, 4, 3, 2, 4, 2))
## [1] TRUE
Another way involves using a loop:
contains_duplicate <- function(v) {
seen <- rep(NA, length(v)) # Initialize with NA values
for (i in seq_along(v)) {
if (v[i] %in% seen) {
return(TRUE)
}
seen[i] <- v[i]
}
return(FALSE)
}
We havenโt talked about NA yet, but we need it above for
a technical reason: initializing with
vector(length = length(v)) would create a vector containing
zeros. This leads to wrong output if v contain a zero.
iris data frame to a tibble and call it
iris_tbliris_tbl <- as_tibble(iris)
Petal.Width and then create a tibble
that only contains petal widths greater than the median.median(iris_tbl$Petal.Width)
## [1] 1.3
iris_tbl |>
filter(Petal.Width > median(iris_tbl$Petal.Width))
## # A tibble: 72 x 5
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## <dbl> <dbl> <dbl> <dbl> <fct>
## 1 7 3.2 4.7 1.4 versicolor
## 2 6.4 3.2 4.5 1.5 versicolor
## 3 6.9 3.1 4.9 1.5 versicolor
## 4 6.5 2.8 4.6 1.5 versicolor
## 5 6.3 3.3 4.7 1.6 versicolor
## 6 5.2 2.7 3.9 1.4 versicolor
## 7 5.9 3 4.2 1.5 versicolor
## 8 6.1 2.9 4.7 1.4 versicolor
## 9 6.7 3.1 4.4 1.4 versicolor
## 10 5.6 3 4.5 1.5 versicolor
## # ... with 62 more rows
Sepal.Length,
Sepal.Width, Species, and
Petal.Area and only the rows where the petal width is
greater than the median.iris_tbl |>
filter(Petal.Width > median(Petal.Width)) |>
mutate(Petal.Area = Petal.Width * Petal.Length) |>
select(-Petal.Length, -Petal.Width)
## # A tibble: 72 x 4
## Sepal.Length Sepal.Width Species Petal.Area
## <dbl> <dbl> <fct> <dbl>
## 1 7 3.2 versicolor 6.58
## 2 6.4 3.2 versicolor 6.75
## 3 6.9 3.1 versicolor 7.35
## 4 6.5 2.8 versicolor 6.9
## 5 6.3 3.3 versicolor 7.52
## 6 5.2 2.7 versicolor 5.46
## 7 5.9 3 versicolor 6.3
## 8 6.1 2.9 versicolor 6.58
## 9 6.7 3.1 versicolor 6.16
## 10 5.6 3 versicolor 6.75
## # ... with 62 more rows
Load the heights_df data frame from worksheet 1.
heights_df <- read.csv("heights.csv")
Recall the height variable is given in centimeters (cm).
In worksheet 2, we created cm_to_ft_inch that converts from
cm to a string representation of feet and inches.
Using dplyr functionality, create a tibble with a
variable height_ft_in in place of height.
heights_df |>
as_tibble() |>
mutate(height_ft_in = cm_to_ft_inch(height)) |>
select(-height)
## # A tibble: 506 x 4
## id_. gender age height_ft_in
## <int> <chr> <int> <chr>
## 1 1 Female 19 5 2
## 2 2 Female 19 5 7
## 3 3 Female 22 5 6
## 4 4 Male 19 5 11
## 5 5 Female 21 5 8
## 6 6 Male 19 6 2
## 7 7 Female 21 5 1
## 8 8 Female 21 5 5
## 9 9 Male 18 6 4
## 10 10 Female 18 5 4
## # ... with 496 more rows