Write the function contains_duplicate(v)
that takes a
numeric vector v
and returns TRUE if any value appears
at least twice in the vector and FALSE otherwise.
There are several ways to do this. The extremely
easy way is it use the duplicated
function.
contains_duplicate <- function(v) {
any(duplicated(v))
}
contains_duplicate(c(1, 2, 3, 1))
## [1] TRUE
contains_duplicate(c(1, 2, 3, 4))
## [1] FALSE
contains_duplicate(c(1, 1, 1, 3, 3, 4, 3, 2, 4, 2))
## [1] TRUE
Another way involves using a loop:
contains_duplicate <- function(v) {
seen <- rep(NA, length(v)) # Initialize with NA values
for (i in seq_along(v)) {
if (v[i] %in% seen) {
return(TRUE)
}
seen[i] <- v[i]
}
return(FALSE)
}
We havenโt talked about NA
yet, but we need it above for
a technical reason: initializing with
vector(length = length(v))
would create a vector containing
zeros. This leads to wrong output if v
contain a zero.
iris
data frame to a tibble and call it
iris_tbl
iris_tbl <- as_tibble(iris)
Petal.Width
and then create a tibble
that only contains petal widths greater than the median.median(iris_tbl$Petal.Width)
## [1] 1.3
iris_tbl |>
filter(Petal.Width > median(iris_tbl$Petal.Width))
## # A tibble: 72 x 5
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## <dbl> <dbl> <dbl> <dbl> <fct>
## 1 7 3.2 4.7 1.4 versicolor
## 2 6.4 3.2 4.5 1.5 versicolor
## 3 6.9 3.1 4.9 1.5 versicolor
## 4 6.5 2.8 4.6 1.5 versicolor
## 5 6.3 3.3 4.7 1.6 versicolor
## 6 5.2 2.7 3.9 1.4 versicolor
## 7 5.9 3 4.2 1.5 versicolor
## 8 6.1 2.9 4.7 1.4 versicolor
## 9 6.7 3.1 4.4 1.4 versicolor
## 10 5.6 3 4.5 1.5 versicolor
## # ... with 62 more rows
Sepal.Length
,
Sepal.Width
, Species
, and
Petal.Area
and only the rows where the petal width is
greater than the median.iris_tbl |>
filter(Petal.Width > median(Petal.Width)) |>
mutate(Petal.Area = Petal.Width * Petal.Length) |>
select(-Petal.Length, -Petal.Width)
## # A tibble: 72 x 4
## Sepal.Length Sepal.Width Species Petal.Area
## <dbl> <dbl> <fct> <dbl>
## 1 7 3.2 versicolor 6.58
## 2 6.4 3.2 versicolor 6.75
## 3 6.9 3.1 versicolor 7.35
## 4 6.5 2.8 versicolor 6.9
## 5 6.3 3.3 versicolor 7.52
## 6 5.2 2.7 versicolor 5.46
## 7 5.9 3 versicolor 6.3
## 8 6.1 2.9 versicolor 6.58
## 9 6.7 3.1 versicolor 6.16
## 10 5.6 3 versicolor 6.75
## # ... with 62 more rows
Load the heights_df
data frame from worksheet 1.
heights_df <- read.csv("heights.csv")
Recall the height
variable is given in centimeters (cm).
In worksheet 2, we created cm_to_ft_inch
that converts from
cm to a string representation of feet and inches.
Using dplyr
functionality, create a tibble with a
variable height_ft_in
in place of height
.
heights_df |>
as_tibble() |>
mutate(height_ft_in = cm_to_ft_inch(height)) |>
select(-height)
## # A tibble: 506 x 4
## id_. gender age height_ft_in
## <int> <chr> <int> <chr>
## 1 1 Female 19 5 2
## 2 2 Female 19 5 7
## 3 3 Female 22 5 6
## 4 4 Male 19 5 11
## 5 5 Female 21 5 8
## 6 6 Male 19 6 2
## 7 7 Female 21 5 1
## 8 8 Female 21 5 5
## 9 9 Male 18 6 4
## 10 10 Female 18 5 4
## # ... with 496 more rows