Output Naming

In the implicit data transformation scenarios we covered in Function Specification, the output columns either had the same names as the input columns (overwriting them) or multiple new columns with standardized names were created. In this section, we cover how to override the default behavior and specify output column names.It is often needed to specify output column names that are more appropriate for the domain context.

This section is organized as follows:

  • Named Function where we cover how to specify the names of the columns resulting from implicitly applying one function to a set of columns.
  • Multiple Functions where we cover how to specify the names of the columns resulting from implicitly applying multiple functions to a set of columns.
  • Anonymous Function where we cover how to specify the names of the columns resulting from implicitly applying an anonymous function to a set of columns.

and for each scenario we cover two approaches:

  • Suffix Specification where we specify the string to append to the original column name e.g. naming the output column of scaling the column col_1 as col_1_scaled.
  • Naming Template where we have more flexibility in naming the output columns via a naming template of the form {.col}_{.fn}" that is a function of the name of the input column {.col} and function applied {.fn}.

Named Function

We wish to specify the names of the columns resulting from implicitly applying one function to a set of columns instead of the default behavior of overwriting the existing columns.

Suffix Specification

We wish to name the output columns by specifying a string to be added as a post-fix to the name of the input columns.

In this example, we wish to apply the function round() to the columns col_1 and col_2 and to name the output columns col_1_rounded and col_2_rounded.

df_2 = df %>%
  mutate(across(c(col_1, col_2), 
                list(rounded = round)))

Here is how this works:

  • across() allows us to pass a list of named functions or anonymous functions where the names would be used as a post-fix to the input column name (with an underscore _ between).
  • In this case we pass to across() a list containing one named function list(rounded = round).

Naming Template

We wish to name the output columns by applying a template that is a function of the names of the input columns.

In this example, we wish to apply the function round() to the columns col_1 and col_2 and to name the output columns v2_rnd_col_1 and v2_rnd_col_2.

df_2 = df %>%
  mutate(across(c(col_1, col_2), 
                round, 
                .names = "v2_rnd_{.col}"))

Here is how this works:

  • across() accepts a .names argument which accepts a template of the form "{.col}_{.fn}" that we can use to specify how the output columns are named as a function of the names of the input columns {.col} and functions applied {.fn}.
  • We specify the template "v2_rnd_{.col}" for generating output column names. To generate the name of the output column, {.col} will be replaced by the name of the input column.

Multiple Functions

We wish to specify the names of the columns resulting from implicitly applying multiple functions to a set of columns instead of the default behavior of naming the new columns with the name of the original column concatenated with _i where i is the index of the function in the input list of functions.

Suffix Specification

We wish to name the output columns by specifying a string to be added as a post-fix to the name of the input columns.

In this example, we wish to apply the functions abs() to yield the absolute value and round() to yield the rounded value of the columns col_1 and col_2 and to name the output columns col_1_mag, col_1_rnd, col_2_mag, and col_2_rnd.

df_2 = df %>%
  mutate(across(c(col_1, col_2), 
                list(mag = abs, rnd = round)))

Here is how this works:

  • across() accepts a list of named functions as described in “Named Function” above.
  • In this case we pass to across() the list of named functions list(mag = abs, rnd = round).
  • Had we not passed a list of named functions, the output column names would have been col_1_1, col_1_2, col_2_1, and col_2_2.

Naming Template

We wish to name the output columns by applying a template that is a function of the names of the input columns and the functions applied.

In this example, we wish to apply the functions abs() to yield the absolute value or magnitude and round() to yield the rounded value of the columns col_1 and col_2 and to name the output columns v2_mag_col_1, v2_rnd_col_1, v2_mag_col_2, and v2_rnd_col_2.

df_2 = df %>%
  mutate(across(c(col_1, col_2), 
                list(mag = abs, rnd = round),
                .names = "v2_{.fn}_{.col}"))

Here is how this works:

  • across() accepts a naming template to its .names argument as described in “Named Function” above.
  • We specify the template "v2_{.fn}_{.col}" for generating output column names. To generate the name of the output column, {.col} will be replaced by the name of the input column and {.fn} will be replaced by the name of the function.
  • If we had passed a list of functions list(abs, round) instead of a list of named functions, the output column names would be v2_1_col_1, v2_2_col_1, v2_1_col_2, and v2_2_col_2.

Anonymous Function

We wish to specify the names of the columns resulting from implicitly applying an anonymous function to a set of columns instead of the default behavior of overwriting the existing columns.

Suffix Specification

We wish to name the output columns by specifying a string to be added as a post-fix to the name of the input columns.

In this example, we wish to apply two anonymous functions to the columns col_1 and col_2 and to name the output columns col_1_rounded, col_1_delta, col_2_rounded, and col_2_delta where rounded represents the first anonymous function and delta represents the second.

df_2 = df %>%
  mutate(across(c(col_1, col_2), 
                list(rnd = ~round(., 2), 
                     dlt = ~(. - lag(.)))))

Here is how this works:

  • across() accepts a list of named functions as described in “Named Function” above.
  • In this case the anonymous functions and their names are as follows:
    • The first rnd = ~round(., 2) returns the approximation of a double precision column to two decimal places and the output columns created would have the post-fix “rnd”.
    • The second dlt = ~(. - lag(.)) returns the difference between the current value of a column and its value for the previous row (See General Operations) and the output columns created would have the post-fix “_dlt”.
  • Had we not passed a list of named functions, the output column names would have been col_1_1, col_1_2, col_2_1, and col_2_2.

Naming Template

We wish to name the output columns by applying a template that is a function of the names of the input columns and the functions applied.

In this example, we wish to apply two anonymous functions to the columns col_1 and col_2 and to name the output columns v2_rnd_col_1, v2_dlt_col_1, v2_rnd_col_2, and v2_dlt_col_2 where rnd represents the first anonymous function and dlt represents the second.

df_2 = df %>%
  mutate(across(c(col_1, col_2), 
                list(rnd = ~round(., 2), 
                     dlt = ~(. - lag(.))),
                .names = "v2_{.fn}_{.col}"))

Here is how this works:

  • across() accepts a naming template to its .names argument as described in “Named Function” above.
  • The anonymous functions are as described in the “Post-fix Specification” scenario under “Anonymous Function” above.
  • We specify the template "v2_{.fn}_{.col}" for generating output column names. To generate the name of the output column, {.col} will be replaced by the name of the input column and {.fn} will be replaced by the name of the function.
  • Had we not passed a list of named functions, the output column names would have been col_1_1, col_1_2, col_2_1, and col_2_2.
R
I/O