String Operations

In this section we will cover the most common data manipulation operations that are relevant to string data. Our focus will be on manipulating columns of a data frame that are of a string data type.

This section is organized as follows:

  1. Properties: Covers how to obtain properties of a string object such as its length, encoding, and data type.
  2. Formatting: Covers how to set the formatting of a string including how to set its case to upper or lower.
  3. Combining: Covers data manipulation operations involving combining strings such as concatenating, collapsing, interpolating, and repeating.
  4. Matching: Covers string matching operations such as detecting an occurrence of a substring or a regex pattern, locating a substring or a regex pattern, and counting the occurrences of a substring or a regex pattern.
  5. Extracting: Covers extracting a substring via its location or via a regex pattern.
  6. Modifying: Covers the string data manipulation operations of finding and replacing substrings or regex patterns. It also covers the related operations of removing and trimming.
  7. Splitting: Covers splitting a string around particular delimiters as well as selecting from and processing the resulting parts.
  8. Sorting: We wish to control special aspects of sorting strings.

In this section, we will show each string data manipulation operation in its most common application scenario. For example, we will show string matching in a data filtering context and pattern extraction in a data transformation application scenario. We suggest you use this chapter in conjunction with the data manipulation patterns chapters (most importantly, Selecting, Filtering, Sorting, Transforming, and Aggregating) to compose the expressions that meet your data manipulation need.

PYTHON
I/O