Properties

In this section, we cover getting and setting the properties or attributes of a string column or a column that we wish to subsequently treat as a string.

We will look at four classes of properties that we commonly need to deal with when working with string data; those are:

  • Data Type where we will cover getting and setting the data-type of a column that we wish to treat as a string.
  • Length where we cover how to obtain the number of characters in each string literal in a string column.
  • Content Type where we cover how to check what is the content held in a string e.g. does the string hold an integer value.
  • Locale where we cover how to get or set the locale used for a particular locale-sensitive operation.

Data Type

We wish to set the data-type of a literal or a vector to string (object).

In this example, we wish to concatenate the string ‘ mins ago.’ to each numerical value in the numeric column col_1.

df_2 = df.assign(
    col_2 = df['col_1'].astype('string') + ' mins ago.'
)

Here is how this works:

  • If we try and apply a string operation to say a numeric type column we would get an error like AttributeError: Can only use .str accessor with string values!.
  • Python and Pandas do not automatically cast to String. The implication is that we need to explicitly typecast to string before carrying out string operations. To get over that, we need to cast the column to string via astype('string'). See Data Type Setting.
  • We use the core Python operator ‘+’ to concatenate a fixed string to each value of the column col_1. See Combining.

Extension: Act on Literal

df_2 = df.assign(
    col_2=df['col_1'].apply(
        lambda x: str(x) + ' mins ago.'
    )
)

Here is how this works:

  • Had we tried to execute x + ' mins ago.', we would have received a TypeError because Python doesn't automatically cast to string. Rather we have to cast explicitly via str(x).

Length

We wish to obtain the number of characters in a string literal or of each element in a vector of strings.

df_2 = df.assign(
  col_2 = df['col_1'].str.len()
)

Here is how this works:

We use the function str.len() to compute the number of characters in each element in the column col_1.

Extension: Act on Literal

df = pd.DataFrame(
    {'col_1': [10, 20, 30, 40]}
)

df_2 = df.assign(
    col_2=df['col_1'].apply(lambda x: len(x))
)

Here is how this works:

  • In this case we need to act on each string value individually and when we need to act on an individual string value, we can’t use the str.len() method (or any of the str accessor methods). In that case, we need to use the core Python counterpart which is len().

Content Type

We wish to check the type of characters held in a string.

In this example, we wish to check whether an element in the string column col_1 holds: a sequence of digits, a decimal, a sequence of alphanumeric characters, empty spaces.

df_2 = df.assign(
    is_int=df['col_1'].str.isnumeric(),
    is_aln=df['col_1'].str.isalnum(),
    is_spc=df['col_1'].str.isspace()
)

Here is how this works:

  • Pandas provides a set of functions to check whether all characters in a string are of a given type.
  • The content type checking functions we use in this example are:
    • isnumeric() to check whether all characters of a string are digits.
    • isalnum() to check whether all characters of a string are alphanumeric.
    • isspace() to check whether all characters of a string are empty space characters.

Locale

Get Global Locale

We wish to obtain some basic information about the current locale.

import locale

locale.getlocale()

Here is how this works:

The function getlocale() from the package locale returns a description of the current locale that includes the language code and encoding.

List Locales

We wish to get a list of supported locales.

import locale

print(locale.windows_locale)

print(locale.locale_alias)

Here is how this works:

The locale package has two dictionaries with the list of available locales. windows_locale contains locales for Windows operating system, and locale_alias contains locales for all other operating systems.

Set Global Locale

We wish to change the global locale for the current environment.

In this example, we wish to set the global locale to UAE Arabic.

import locale

locale.setlocale(locale.LC_ALL, 'ar_AE')

Here is how this works:

  • We use the function setlocale() from the package locale to set (override) the current locale.
  • We can obtain the name to use to refer to a particular locale by looking at the dictionary returned by locale.locale_alias. See “List Locales” above.
PYTHON
I/O