In the implicit data transformation scenarios we covered in Function Specification, the output columns either had the same names as the input columns (overwriting them) or multiple new columns with standardized names were created. In this section, we cover how to override the default behavior and specify output column names.It is often needed to specify output column names that are more appropriate for the domain context.
This section is organized as follows:
and for each scenario we cover two approaches:
col_1
as col_1_scaled
.{.col}_{.fn}"
that is a function of the name of the input column {.col}
and function applied {.fn}
.We wish to specify the names of the columns resulting from implicitly applying one function to a set of columns instead of the default behavior of overwriting the existing columns.
Suffix Specification
We wish to name the output columns by specifying a string to be added as a post-fix to the name of the input columns.
In this example, we wish to apply the function round()
to the columns col_1
and col_2
and to name the output columns col_1_rounded
and col_2_rounded
.
df_2 = df %>%
mutate(across(c(col_1, col_2),
list(rounded = round)))
Here is how this works:
across()
allows us to pass a list of named functions or anonymous functions where the names would be used as a post-fix to the input column name (with an underscore _
between).across()
a list containing one named function list(rounded = round)
.Naming Template
We wish to name the output columns by applying a template that is a function of the names of the input columns.
In this example, we wish to apply the function round()
to the columns col_1
and col_2
and to name the output columns v2_rnd_col_1
and v2_rnd_col_2
.
df_2 = df %>%
mutate(across(c(col_1, col_2),
round,
.names = "v2_rnd_{.col}"))
Here is how this works:
across()
accepts a .names
argument which accepts a template of the form "{.col}_{.fn}"
that we can use to specify how the output columns are named as a function of the names of the input columns {.col}
and functions applied {.fn}
."v2_rnd_{.col}"
for generating output column names. To generate the name of the output column, {.col}
will be replaced by the name of the input column.We wish to specify the names of the columns resulting from implicitly applying multiple functions to a set of columns instead of the default behavior of naming the new columns with the name of the original column concatenated with _i
where i
is the index of the function in the input list of functions.
Suffix Specification
We wish to name the output columns by specifying a string to be added as a post-fix to the name of the input columns.
In this example, we wish to apply the functions abs()
to yield the absolute value and round()
to yield the rounded value of the columns col_1
and col_2
and to name the output columns col_1_mag
, col_1_rnd
, col_2_mag
, and col_2_rnd
.
df_2 = df %>%
mutate(across(c(col_1, col_2),
list(mag = abs, rnd = round)))
Here is how this works:
across()
accepts a list of named functions as described in “Named Function” above.across()
the list of named functions list(mag = abs, rnd = round)
.col_1_1
, col_1_2
, col_2_1
, and col_2_2
.Naming Template
We wish to name the output columns by applying a template that is a function of the names of the input columns and the functions applied.
In this example, we wish to apply the functions abs()
to yield the absolute value or magnitude and round()
to yield the rounded value of the columns col_1
and col_2
and to name the output columns v2_mag_col_1
, v2_rnd_col_1
, v2_mag_col_2
, and v2_rnd_col_2
.
df_2 = df %>%
mutate(across(c(col_1, col_2),
list(mag = abs, rnd = round),
.names = "v2_{.fn}_{.col}"))
Here is how this works:
across()
accepts a naming template to its .names
argument as described in “Named Function” above."v2_{.fn}_{.col}"
for generating output column names. To generate the name of the output column, {.col}
will be replaced by the name of the input column and {.fn}
will be replaced by the name of the function.list(abs, round)
instead of a list of named functions, the output column names would be v2_1_col_1
, v2_2_col_1
, v2_1_col_2
, and v2_2_col_2
.We wish to specify the names of the columns resulting from implicitly applying an anonymous function to a set of columns instead of the default behavior of overwriting the existing columns.
Suffix Specification
We wish to name the output columns by specifying a string to be added as a post-fix to the name of the input columns.
In this example, we wish to apply two anonymous functions to the columns col_1
and col_2
and to name the output columns col_1_rounded
, col_1_delta
, col_2_rounded
, and col_2_delta
where rounded
represents the first anonymous function and delta
represents the second.
df_2 = df %>%
mutate(across(c(col_1, col_2),
list(rnd = ~round(., 2),
dlt = ~(. - lag(.)))))
Here is how this works:
across()
accepts a list of named functions as described in “Named Function” above.rnd = ~round(., 2)
returns the approximation of a double precision column to two decimal places and the output columns created would have the post-fix “rnd”
.dlt = ~(. - lag(.))
returns the difference between the current value of a column and its value for the previous row (See General Operations) and the output columns created would have the post-fix “_dlt”
.col_1_1
, col_1_2
, col_2_1
, and col_2_2
.Naming Template
We wish to name the output columns by applying a template that is a function of the names of the input columns and the functions applied.
In this example, we wish to apply two anonymous functions to the columns col_1
and col_2
and to name the output columns v2_rnd_col_1
, v2_dlt_col_1
, v2_rnd_col_2
, and v2_dlt_col_2
where rnd
represents the first anonymous function and dlt
represents the second.
df_2 = df %>%
mutate(across(c(col_1, col_2),
list(rnd = ~round(., 2),
dlt = ~(. - lag(.))),
.names = "v2_{.fn}_{.col}"))
Here is how this works:
across()
accepts a naming template to its .names
argument as described in “Named Function” above."v2_{.fn}_{.col}"
for generating output column names. To generate the name of the output column, {.col}
will be replaced by the name of the input column and {.fn}
will be replaced by the name of the function.col_1_1
, col_1_2
, col_2_1
, and col_2_2
.