We wish to change the column names of a data frame to have them conform to a standard naming convention. In particular, we cover converting column names to snake_case and to camelCase.
For general column name cleaning, see Implicit Renaming and String Operations.
We wish to change the names of columns so that they are in snake case i.e. “words” are separated by underscores ‘_’
, and the entire name is in lowercase.
library(janitor)
df_2 = df %>% clean_names()
Here is how this works:
clean_names()
function from the janitor
package.clean_names()
function has an argument case
that allows us to specify the naming style that we wish to have the column names conform to. By default case="snake"
and, therefore, we left case
unspecified.clean_names()
function essentially does two things:‘_’
, drops characters that are not letters or numbers, lowers case, and makes names unique.case
argument, which in this case is case=”snake”
.Alternative: Arbitrary Separator to Snake Case via Regex
df_2 = df %>%
rename_with(
~ str_replace_all(.x, '[\\W_]+', '_') %>%
str_to_lower())
Here is how this works:
rename_with()
to rename columns by applying a function to existing column names and using the output as the new column names. See Implicit Renaming.str_replace_all()
to replace any separators with an underscore. See String Replacing.'[\\W_]+'
and works as follows:\\W
captures any character that is not a letter or digit or underscore, i.e. it captures punctuation marks, symbols, and whitespace._
captures underscores. We need to capture underscores because \\W
doesnt include underscore characters.[]
specifies an or relationship between the characters within+
captures one or more of the character specified previously'_'
i.e. we replace what is captured by the pattern with an underscore.Alternative: Camel Case to Snake Case via Regex
df_2 = df %>%
rename_with(
~ str_replace_all(.x, '(?<!^)(?=[A-Z])', '_') %>%
str_to_lower())
Here is how this works:
rename_with()
to rename columns by applying a function to existing column names and using the output as the new column names. See Implicit Renaming.str_replace_all()
to replace any separators with an underscore. See String Replacing.(?<!^)(?=[A-Z])
is a combination of two positive lookaround assertions.(?<!^)
is a negative look-behind. It asserts that the current position in the string is not the start of the string (^
). The negative look-behind (?<!...)
asserts that the preceding character(s) do not match the pattern inside the look-behind. In this case, ^
matches the start of the string, so this look-around asserts that the current position is not the start of the string.(?=[A-Z])
is a positive look-ahead. It asserts that the next character in the string is uppercase ([A-Z]
). The positive look-ahead (?=...)
asserts that the following character(s) match the pattern inside the look-ahead. In this case, the next character is uppercase, so this look-around asserts that the next character is uppercase.'_'
i.e. we insert an underscore in each position matched by the regular expression above.We wish to change the names of columns so that they are in camel case i.e. “words” are not separated by any delimiters and the first letter of each word, except the first, is in upper case.
library(janitor)
df_2 = df %>%
clean_names(case = "lower_camel")
Here is how this works:
clean_names()
function from the janitor
package.clean_names()
function has an argument case
that allows us to specify the naming style that we wish to have the column names conform to, which in this case is case="lower_camel"
.Extension: Pascal Case
library(janitor)
df_2 = df %>%
clean_names(case = "upper_camel")
Here is how this works:
This works similarly to the primary solution except that we set case="upper_camel"
so that clean_names()
changes column names to Pascal Case.
Extension: Capitalize Abbreviations
We wish to have abbreviations capitalized in the generated camel case column names.
library(janitor)
df_2 = df %>%
clean_names(case = "lower_camel", abbreviations = c("ID", "CVR"))
Here is how this works:
clean_names()
function with the case argument set to case="lower_camel"
.abbreviations
argument of the clean_names()
function.Alternative: via Regular Expression
df_2 = df %>%
rename_with(
~str_replace_all(
.x,
'[\\W_]+\\w',
~str_to_upper(str_sub(.x, -1))))
Here is how this works:
rename_with()
to rename columns by applying a function to existing column names and using the output as the new column names. See Implicit Renaming.str_replace_all()
to replace any separators with an underscore. See String Replacing.'[\\W_]+\\w'
and works as follows:\\W
captures any character that is not a letter or digit or underscore, i.e. it captures punctuation marks, symbols, and whitespace._
captures underscores. We need to capture underscores because \\W
doesnt include underscore characters.[]
specifies an or relationship between the characters within+
captures one or more of the character specified previously\\w
captures any word character~str_to_upper(str_sub(.x, -1))
which is an anonymous function that captures the last character of the captured pattern and converts it to upper case. See Extracting by Location.