In this section, we cover getting and setting the properties or attributes of a string column or a column that we wish to subsequently treat as a string.
We will look at four classes of properties that we commonly need to deal with when working with string data; those are:
We wish to set the data-type of a literal or a vector to string (object).
In this example, we wish to concatenate the string ‘ mins ago.’
to each numerical value in the
numeric column col_1
.
df_2 = df.assign(
col_2 = df['col_1'].astype('string') + ' mins ago.'
)
Here is how this works:
AttributeError: Can only use .str accessor with string values!
.astype('string')
.
See Data Type Setting.‘+’
to concatenate a fixed string to each value of the
column col_1
. See Combining.Extension: Act on Literal
df_2 = df.assign(
col_2=df['col_1'].apply(
lambda x: str(x) + ' mins ago.'
)
)
Here is how this works:
x + ' mins ago.'
, we would have received a TypeError
because Python
doesn't automatically cast to string. Rather we have to cast explicitly via str(x)
.We wish to obtain the number of characters in a string literal or of each element in a vector of strings.
df_2 = df.assign(
col_2 = df['col_1'].str.len()
)
Here is how this works:
We use the function str.len()
to compute the number of characters in each element in the
column col_1
.
Extension: Act on Literal
df = pd.DataFrame(
{'col_1': [10, 20, 30, 40]}
)
df_2 = df.assign(
col_2=df['col_1'].apply(lambda x: len(x))
)
Here is how this works:
str.len()
method (or any of the str
accessor
methods). In that case, we need to use the core Python counterpart which is len()
.We wish to check the type of characters held in a string.
In this example, we wish to check whether an element in the string column col_1
holds: a sequence
of digits, a decimal, a sequence of alphanumeric characters, empty spaces.
df_2 = df.assign(
is_int=df['col_1'].str.isnumeric(),
is_aln=df['col_1'].str.isalnum(),
is_spc=df['col_1'].str.isspace()
)
Here is how this works:
isnumeric()
to check whether all characters of a string are digits.isalnum()
to check whether all characters of a string are alphanumeric.isspace()
to check whether all characters of a string are empty space characters.Get Global Locale
We wish to obtain some basic information about the current locale.
import locale
locale.getlocale()
Here is how this works:
The function getlocale()
from the package locale
returns a description of the current locale
that includes the language code and encoding.
List Locales
We wish to get a list of supported locales.
import locale
print(locale.windows_locale)
print(locale.locale_alias)
Here is how this works:
The locale package has two dictionaries with the list of available locales. windows_locale
contains locales for Windows operating system, and locale_alias
contains locales for all other
operating systems.
Set Global Locale
We wish to change the global locale for the current environment.
In this example, we wish to set the global locale to UAE Arabic.
import locale
locale.setlocale(locale.LC_ALL, 'ar_AE')
Here is how this works:
setlocale()
from the package locale
to set (override) the current
locale.locale.locale_alias
. See “List Locales” above.