Why use seq?

The cumsum function returns the cumulative sum of a numeric vector.

cumsum(c(1, 1, 2, 1))

## [1] 1 2 4 5

cumsum(c(3, 1, 1, 17))

## [1]  3  4  5 22

The first entry is the first element. The second entry is sum of the first two elements; the third entry is the sum of the first three elements; the fourth entry is the sum of the first four elements; and so forth.

Let’s write our own cumsum function.

my_cumsum <- function(x) {
  result <- numeric(length = length(x))
  for (i in 1:length(x)) { # focus on this line
    result[i] <- sum(x[1:i])
  }
  result
}
my_cumsum(c(1, 1, 2, 1))

## [1] 1 2 4 5

my_cumsum(c(3, 1, 1, 17))

## [1]  3  4  5 22

We get matching output, on the two inputs. Let’s now try to pass in an empty numeric vector to both cumsum and my_cumsum:

cumsum(numeric(0))

## numeric(0)

my_cumsum(numeric(0))

## [1] NA

Now my_cumsum returns an incorrect result. NA is the result of indexing a vector with a value outside of its bounds:

numeric(0)[1] # Trying to access the first element of an empty vector.

## [1] NA

Why did this occur? The following expression is the culprit:

for (i in 1:length(x)) {

if x is empty, length(x) is zero, so our loop looks like

for (i in 1:0) {

Since 1:0 is the vector c(1, 0), we in fact enter the loop with these two values:

for (i in 1:0) {
  print(i)
}

## [1] 1
## [1] 0

Instead of 1:length(x) we need an expression that is empty if x is empty. seq_along accomplishes this:

seq_along(numeric(0))

## integer(0)

for (i in seq_along(numeric(0))) {
  print(i)
}

Nothing is printed because the loop is not entered.

So a correct implementation follows:

my_cumsum <- function(x) {
  result <- numeric(length = length(x))
  for (i in seq_along(x)) {
    result[i] <- sum(x[1:i])
  }
  result
}
my_cumsum(c(1, 1, 2, 1))

## [1] 1 2 4 5

my_cumsum(numeric(0))

## numeric(0)

The functions seq_along and seq_len are used to protect against this bad behavior when the set of loop indices is empty.

Why use `seq`?

2022-06-24