Shifting

We wish to shift the values of a vector up or down by a given offset. Shifting allows us to access a vector’s previous or following values enabling us to compute changes.

Previous

We wish to shift the value of a vector down by one i.e. we wish to obtain the previous value.

In this example, we wish to compute the difference and rate of change between the value of each element of the column attr_1 and the value of its previous element. The data frame is sorted by the value of the column date.

library(lubridate)

df_2 = df %>% 
  arrange(date) %>%
  mutate(
    previous = lag(attr_1),
    delta = attr_1 - previous,
    change = delta / previous)

Here is how this works:

  • We use the function lag() from the dplyr package to shift the values of the column attr_1 down and store the resulting vector in a new column previous.
  • By default lag() shifts down by 1 position, i.e. an offset of 1.
  • In delta = attr_1 - previous, we compute the difference between the current value of attr_1 and its previous value stored in previous.
  • In change = delta / previous, we compute the rate of change.
  • To be confident of the order, we start by sorting the data frame by the values of the date column. See Sorting.

Extension: Specify Offset Value

We wish to shift the value of a vector down by an arbitrary offset.

In this example, we wish to compute the difference and rate of change between the value of each element of the column attr_1 and its value one week prior (seven days earlier). The data frame is sorted by the value of the column date.

library(lubridate)

df_2 = df %>% 
  arrange(date) %>%
  mutate(
    last_week = lag(attr_1, 7),
    delta = attr_1 - last_week,
    change = delta / last_week)

Here is how this works:

This code is similar to the code above with one exception: We pass the desired offset value to the argument n of lag(), which here is n=7.

Extension: Fill NA

When we shift a vector, we inadvertently create missing values corresponding to the shift. By default, those missing values are encoded as NA. We can specify an alternative value.

In this example, we wish to fill the missing values resulting from the shift with 0.

library(lubridate)

df_2 = df %>% 
  arrange(date) %>%
  mutate(
    previous = lag(attr_1, default = 0),
    delta = attr_1 - previous)

Here is how this works:

  • This code is similar to the code above with one exception: We pass the desired value to use to fill NAs generated by shifting to the argument default of lag(), which here is default=0.
  • Note that this approach is preferable to filling NAs after shifting because there may be other NAs in the data that we do not necessarily wish to replace.

Extension: Shift Per Group

We wish to shift the value of a column down by one i.e. we wish to obtain the previous value based on an ordered column per group, where the groups are specified by the column col_1

In this example, we wish to compute the difference between the value of each element of the column col_2 and the value of its previous element based on the column date for each group in col_1.

df_2 = df %>%
  arrange(date) %>%
  group_by(col_1) %>%
  mutate(previous = lag(col_2))

Here is how this works:

This code is similar to the code above with one exception: We group the data frame by col_1 and then calculate lag. We sort the data frame by date to calculate the lag based on date column.

Next

We wish to shift the value of a vector up by one i.e. we wish to obtain the next value.

In this example, we wish to compute the difference and rate of change between the value of each element of the column attr_1 and the value of its next element. The data frame is sorted by the value of the column date.

library(lubridate)

df_2 = df %>% 
  arrange(date) %>%
  mutate(
    next_val = lead(attr_1),
    delta = next_val - attr_1,
    change = delta / attr_1)

Here is how this works:

  • This code is similar to the code under Previous above except that we use lead() instead of lag() because we wish to shift up i.e. obtain the next value.
  • The same extensions covered under Previous above can be applied here.
R
I/O