Trimming

We wish to remove leading and trailing white space characters. We will also cover how to eliminate duplicate spaces within the string.

Surrounding

We wish to remove leading and trailing white space characters.

df_2 = df %>% 
  mutate(col_2 = str_trim(col_1))

Here is how this works:

  • We use the function str_trim() (from the stringr package) to eliminate leading and trailing white spaces.
  • The output data frame df_2 will be a copy of the input data frame df with an added column col_2 where each value is the corresponding value of col_1 with any leading or trailing white space characters removed.

Extension: Trim One Side Only

df_2 = df %>% 
  mutate(
    col_2 = str_trim(col_1, side='left'),
    col_3 = str_trim(col_1, side='right'))

Here is how this works:

  • By default, str_trim() will remove white space characters on both sides of the string.
  • We can choose a side of the string to “trim”. We do so by setting the side argument to either side='left' or side='right'.

Intermittent

We wish to replace duplicate white space characters within the string with a single white space.

df_2 = df %>% 
    mutate(col_2 = str_squish(col_1))

Here is how this works:

  • We use the function str_squish() (from the stringr package) to replace duplicate white space occurring within the string with a single white space. str_squish() also removes any leading or trailing white spaces.
  • The output data frame df_2 will be a copy of the input data frame df with an added column col_2 where each value is the corresponding value of col_1 with any leading and trailing white spaces removed and duplicate white space characters within the string replaced with a single white space.

Alternative: Replace Duplicate White Spaces

df_2 = df %>% 
  mutate(col_2 = str_replace_all(col_1, "\\s{2,}", " "))

Here is how this works:

  • We can use str_replace_all() to capture all occurrences of intermittent duplicate white spaces and replace them with a single white space.
  • We capture occurrences of intermittent duplicate white spaces via the regular expression "\\s{2,}" where:
    • \\s denotes a white space character and
    • {2, } specifies that we wish to capture a pattern or two or more occurrences.
R
I/O