We wish to count the number of occurrences of a given pattern in a target string.

We will cover the following common pattern occurrence counting scenarios:

**Substring**: how to obtain the number of occurrences of a given substring in a given string.**Regular Expression**: how to obtain the number of occurrences of match of a given regular expression in a given string.**Word**: the special case of how to count the number of words in a given string.

The scenarios above can be extended in multiple ways, the most common of which are:

**Ignore Case:**Ignoring case (of both pattern and string) while matching.**Pattern Column:**Matching a column of patterns against a column of strings of the same size.**Multiple Patterns:**Checking for multiple patterns at a time.

We wish to count the number of occurrences of a given substring in a given string.

In this example, we count the number of occurrences of the string `‘XY’`

in each value of the
column `col_1`

.

```
import re
df_2 = df.assign(col_2=df['col_1'].str.count(re.escape('XY')))
```

Here is how this works:

- We use the
`str.count()`

method from the`str`

accessor set of string manipulation methods of Pandas`Series`

to count the number of occurrences of the substring`‘XY’`

in each value of the column`col_1`

. - The
`str.count()`

function expects a regular expression by default. We wrap the substring in`re.escape()`

to escape any special characters. - The output data frame
`df_2`

will be a copy of the input data frame`df`

with an added column`col_2`

holding the number of occurrences of the string`‘XY’`

in the corresponding value of the column`col_1`

.

We wish to count the number of occurrences of a given regular expression in a given string.

In this example, we count the number of vowels (while ignoring case) in each value of the
column `col_1`

.

```
df_2 = df.assign(col_2=df['col_1'].str.count('[aeiou]'))
```

Here is how this works:

- This works similarly to the substring scenario described above except that we pass a regular
expression to
`str.count()`

instead of a substring. - The regular expression
`'[aeiou]'`

matches any of the characters in the square brackets (which are the five vowels in the English language). It can be thought of as an`or`

operation. - The output data frame
`df_2`

will be a copy of the input data frame`df`

with an added column`col_2`

holding the number of vowels in the corresponding value of the column`col_1`

.

We wish to count the number of words in a string.

In this example, we wish to count the number of words in each value of the column `col_1`

and to
return that as a new integer column `col_2`

.

```
df_2 = df.assign(
col_2=df['col_1'].str.count('\w+'))
```

Here is how this works:

- This works similarly to the substring scenario described above in Regular Expression.
- We use the regular expression
`‘\w+’`

where:`\w`

represents any ‘word’ character; i.e. letters, digits or underscore.`+`

specifies that we are looking for one or more ‘word’ characters.

- The output data frame
`df_2`

is a copy of the input data frame`df`

with an additional column`col_2`

where each row holds the number of words in the corresponding value of the column`col_1`

.

We wish to count the number of occurrences of a given substring in a given string while ignoring case.

In this example, we count the number of occurrences of the string `‘xy’`

in each value of the
column `col_1`

while ignoring case.

```
import re
df_2 = df.assign(col_2=df['col_1'].str.lower().str.count(re.escape('xy')))
```

Here is how this works:

- To ignore the case while matching, we use
`str.lower()`

to convert values in`col_1`

to lower case, and we pass a lower case expression. See Ignore Case under Detecting for more details. - We can extend the solutions presented in Regular Expression in the same way.

*Alternative: Via re.IGNORECASE*

```
import re
df_2 = df.assign(col_2=df['col_1'].str.count(re.escape('xy'), flags=re.IGNORECASE))
```

Here is how this works:

- This code is similar to the code under Regular Expression above except that
we pass
`flags=re.IGNORECASE`

to the second argument of`str.count()`

to perform case-insensitive matching. - We can extend the solutions presented in Substring in the same way.

We wish to count the number of occurrences of a value of one column in the value of another column for each row.

In this example, we have a data frame `df`

with two column `col_1`

and `col_2`

, we wish to count the
number of occurrences of the value of `col_2`

in `col_1`

.

```
df_2 = df.assign(
col_3=df.apply(lambda x: x['col_1'].count(x['col_2']), axis=1))
```

Here is how this works:

`str.count()`

is not vectorized over the pattern, so we use python's`count()`

function which works on string.- We need to use
`apply()`

with`axis=1`

to apply a`lambda`

in a row wise manner to each row of the data frame`df`

. See Non-Vectorized Transformation. - For each row, we will return the number of occurrences of the value of
`col_2`

in`col_1`

as an integer. - The output data frame
`df_2`

will be a copy of the input data frame`df_1`

with an added column`col_3`

holding the number of occurrences of the value of`col_2`

in`col_1`

for the corresponding row. - We can extend the solutions presented in Regular Expression in the same way.

**Count All**

We wish to return the total number of occurrences of all patterns as a single integer value. In other words, we wish to return the sum of occurrences of a given set of patterns in a given string.

```
df_2 = df.assign(col_2=df['col_1'].str.count('XY|YX'))
```

Here is how this works:

- We use the or operator
`|`

of regular expressions to build a regular expression that captures all the patterns we wish to look for or’ed together. In this case that regular expression is`'XY|YX'`

. - For each value of
`col_1`

,`str.count()`

will return the total number of occurrences of all patterns. - The output data frame
`df_2`

will be a copy of the input data frame`df`

with an added column`col_2`

holding the sum of occurrences of the specified patterns in the corresponding value of the column`col_1`

. - See Regular Expression above for a more detailed description.

**Count Each**

We wish to return the number of occurrences of each pattern in a set of patterns as a vector of integer values (one value for each pattern).

In this example, for each value of the column `col_1`

, we wish to compute the difference between the
number of occurrences of the string `‘)’`

and the string `‘(’`

.

```
def diff(p_list):
return p_list[0] - p_list[1]
df_2 = df.assign(
col_2=df.apply(lambda x: diff([x['col_1'].count(pattern) for pattern in ['(', ')']]), axis=1))
```

Here is how this works:

- We use list comprehension to count the occurrence of each string and append it to a list.
- For each value of
`col_1`

and for each pattern in`['(', ')']`

,`str.count()`

will return holding the number of occurrences of the respective pattern. - We use the function
`diff()`

to subtract the two values returned the list comprehension. - We need to use
`apply()`

with`axis=1`

to apply a`lambda`

in a row wise manner to each row of the data frame`df`

. See Non-Vectorized Transformation. - The output data frame
`df_2`

will be a copy of the input data frame`df`

with an added column`col_2`

where each cell holds the difference between the number of occurrences of the two patterns in the corresponding value of the column`col_1`

.

PYTHON