Filtering by Multiple Conditions

We wish to filter rows that satisfy a logical combination of multiple conditions.

We will cover the two most common scenarios:

  • Taking the AND (conjunction) of two (or more) logical expressions.
  • Taking the OR (disjunction) of two (or more) logical expressions.

AND

We wish to filter rows that meet two (or more) conditions.

In this example, we wish to filter rows of the data frame df where the numeric column col_1 is greater than 5 and where the string column col_2 has a value that contains the substring ‘token’.

df_2 = df.loc[(df['col_1'] > 5) & (df['col_2'].str.contains('token'))]

Here is how this works:

  • We use bitwise & to compute the AND (conjunction) of the two conditions.
  • In general, we use &, |, and ~ instead of and, or, and not when combining multiple filters inside loc[].
    • Whereas and, or, and not apply to logical literals, they do not apply to Series or DataFrame objects.
    • When and, or, and not are evaluated, Python attempts to find the truthiness of the objects and it does not make sense for a Series to be True or False. We cover the reasoning in detail in Logical Operations.
  • The individual conditions need to be wrapped in parentheses () when combined logically inside loc[]
    • The parenthesis around the logical expressions () are essential because the logical operators (&, |, and ~) have higher precedence than the comparison operators and we wish to carry out the comparison first and then the logical operations.
    • Without the parentheses the expression doesn’t evaluate to a logical Series and we get an error.
  • If the logic gets too complicated, it’s advisable to break it down its creation into multiple steps where we create columns representing the intermediate logical steps. This makes it easier to follow and debug.

OR

We wish to filter rows that meet any one of two (or more) conditions.

In this example, we wish to filter rows of the data frame df where the numeric column col_1 is greater than 5 or where the string column col_2 has a value that contains the substring ‘token’.

df.loc[(df['col_1'] > 5) | (df['col_2'].str.contains('token'))]

Here is how this works:

This works similarly to the “AND” scenario described above except that we use bitwise or | instead of bitwise and &.

PYTHON
I/O