Non Vectorized Filtering

In some situations, the filtering logic we wish to carry out can not be applied in a vectorized manner column wise, rather it needs to be applied in a non-vectorized manner to each row individually.

In this example, we wish to filter rows where the mean of the values of the columns col_1 and col_2 is greater than 0.

df_2 = df %>%
  rowwise() %>%
  filter(mean(c(col_1, col_2)) > 0)

Here is how this works:

  • rowwise() switches the mode of execution of the operations that follow from column wise operation to row wise operation which allows us to apply a non-vectorized function one row at a time.
  • Because of rowwise(), the expression inside filter() will be applied one row at a time (instead of the usual execution on entire columns).
  • In mean(c(col_1, col_2)) > 0, the mean of the values of col_1 and col_2 for the current row is computed and then compared with 0. If the result is True, the row is retained, else it is not included in the output.
  • Note that any operations carried out after filter() will also be carried out in a non-vectorized manner. To switch back to regular vectorized operation, add ungroup() to the chain.
  • See Non Vectorized Transformation for a deeper coverage of non vectorized operations. All the scenarios covered there can also be applied for filtering.
R
I/O