Trimming

We wish to remove leading and trailing white space characters. We will also cover how to eliminate duplicate spaces within the string.

Surrounding

We wish to remove leading and trailing white space characters.

df_2 = df.assign(
    col_2 = df['col_1'].str.strip()
)

Here is how this works:

  • We use the function str.strip() to eliminate leading and trailing white spaces.
  • The output data frame df_2 will be a copy of the input data frame df with an added column col_2 where each value is the corresponding value of col_1 with any leading or trailing white space characters removed.

Extension: Strip One Side Only

df_2 = df.assign(
    col_2 = df['col_1'].str.lstrip(),
    col_3 = df['col_1'].str.rstrip()
)

Here is how this works:

  • By default, str.strip() will remove white space characters on both sides of the string.
  • We can choose a side of the string to “strip”. We do so by using either the str.lstrip() or str.rstrip() functions instead of str.strip().

Extension: Specify Characters to Strip

df_2 = df.assign(
    col_2 = df['col_1'].str.strip('_')
)

Here is how this works:

  • A convenient feature of the str.strip() method is the ability to define other characters to remove should they be leading or trailing.
  • In this example, we remove any leading or trailing underscore '_' characters.

Intermittent

We wish to replace duplicate white space characters within the string with a single white space.

df_2 = df.assign(
    col_2 = df['col_1'].str.replace('\s{2,}', ' ', regex=True)
)

Here is how this works:

  • We can use str.replace() to capture all occurrences of intermittent duplicate white spaces and replace them with a single white space.
  • We capture occurrences of intermittent duplicate white spaces via the regular expression "\s{2,}" where:
    • \s denotes a white space character and
    • {2, } specifies that we wish to capture a pattern or two or more occurrences.
PYTHON
I/O