Most data transformation involves operating on columns with vectorized functions; i.e. functions that accept a vector, perform an operation on each element of that vector, and return a vector of the same size as the input vector eliminating the need for a loop. There are times though when we need to operate on rows in a non-vectorized manner e.g. if we wish to obtain the mean value of some columns for each row.

In this example, we have a data frame `df`

with two numerical columns `col_1`

and `col_2`

and we wish to create a new column `col_3`

where each value is the mean of the values of the columns `col_1`

and `col_2`

for the same row. We also wish to create a column `col_4`

where each value is the minimum of the values of the columns `col_1`

and `col_2`

for the same row

```
df_2 = df %>%
rowwise() %>%
mutate(
col_3 = mean(c(col_1, col_2), na.rm = TRUE),
col_4 = min(col_1, col_2, na.rm = TRUE)
)
```

Here is how this works:

`rowwise()`

switches the mode of execution of the operations that follow from column wise operation to row wise operation which allows us to apply a non-vectorized function one row at a time.- Because of
`rowwise()`

, the expression inside`mutate()`

will be applied one row at a time (instead of the usual execution on entire columns). - In
`mean(c(col_1, col_2))`

, the mean of the values of the columns`col_1`

and`col_2`

for each row is computed. - In
`min(col_1, col_2)`

, the minimum value of the values of the columns`col_1`

and`col_2`

for each row is selected. - Depending on the signature of the function we wish to use, we may need to wrap the inputs in a vector
`c()`

or a`list()`

. For instance, here we wrap the column names in`c()`

for`mean()`

but pass the columns directly to`min()`

because of their signatures:- The signature of
`mean()`

is`mean(x, na.rm = FALSE, ...)`

i.e. it expects a single vector like object holding the numerical values to be averaged. - While the signature of
`min()`

is`min(..., na.rm = FALSE)`

where the`…`

accepts any number of individual values (or vectors).

- The signature of
- The argument
`na.rm = TRUE`

, instructs both functions`mean()`

and`min()`

to ignore any missing values`NA`

. - Note that any operations carried out after
`filter()`

will also be carried out in a non-vectorized manner. To switch back to regular vectorized operation, add`ungroup()`

to the chain.

*Alternatively:*

We can use any of the map family of functions from the `purrr`

library inside `mutate()`

to apply any non vectorized function to each element of one or more columns.

```
df_2 = df %>%
mutate(
col_3 = map2_dbl(col_1, col_2, ~mean(c(.x, .y), na.rm = TRUE)),
col_4 = map2_dbl(col_1, col_2, min, na.rm = TRUE)
)
```

Here is how this works:

- The first step to using the
`purrr`

map family of functions from is to identify the right mapping function for the inputs and output of the situation at hand. In this case since we have two input columns and the output is a double precision numerical, we opted for the`map2_dbl()`

mapping function. `map2_dbl()`

expects:- two columns (more precisely, two lists of the same length) and
- a function or an anonymous function (one-sided formula) that accepts two inputs and returns a single numerical value.

`map2_dbl()`

iterates over the two columns one row at a time and passes the corresponding values of the two columns to the function and finally returns the output as a vector of the same size as each of the input lists.- As described above, the way we structure the call to the function depends on the signature of the function. In this case, while
`min()`

can accept any number of inputs,`mean()`

expects a single vector like input. Therefore:- For
`mean()`

: We used an anonymous function`~mean(c(.x, .y))`

to wrap the values of the two columns into a vector which is then passed to`mean()`

. - For
`min()`

: We let`map2_dbl()`

pass the column values directly to the first two arguments of`min()`

.

- For
- We cover mapping over a list and the
`purrr`

map family of functions in detail in List Operations. - Perhaps an advantage for using the map functions over
`rowwise()`

is that we can include both vectorized and non-vectorized operations inside the same call to`mutate()`

.

R