Extracting

One of the most common operations carried out on string data is extracting parts of a string. This is especially prevalent during the early data-cleaning phase of a data project.

There are two common string extraction scenarios which we will cover here:

  • Extract by Regular Expression: Extract a substring that matches a given regular expression. For instance; extract the first occurrence of a sequence of digits from a string. Substring extraction by regular expressions is quite powerful and is by far the most common method of string extraction in practice.
  • Extract by Location: Extract a substring given its start and end location indices. For instance; extract the second through to the tenth characters of a string.
R
I/O