Dynamic Condition Specification

We wish to specify the entire logical condition that we wish to use to filter the rows of a data frame dynamically (as opposed to specifying the columns and functions separately). This may be practical in cases such as when we wish to apply a condition to many data frames of different names but same column names

In this section, we cover the following dynamic condition specification scenarios:

  1. Lambda Function where we cover how to specify the logical condition(s) to use for filtering the rows of a data frame as a lambda function e.g. lambda x: x[’col_1’] > 0.
  2. String Expression where we cover how to specify the logical condition(s) to use for filtering the rows of a data frame as a string variable e.g. ‘col_1 > 0’.

Lambda Function

We wish to specify the logical condition(s) to use for filtering the rows of a data frame as a lambda function .

One Condition

row_filter = lambda x: x['col_1'] > 0

df_2 = df.loc[row_filter]

Here is how this works:

  • We specify the logical condition that we wish to apply dynamically as a lambda function.
  • The lambda function lambda x: x['col_1'] > 0, expects a data frame, referred to by x, that has a column named 'col_1' and returns a Series of logical values that is True for rows where the value of col_1' is greater than 0.
  • loc[] accepts a lambda function (a callable) to which the data frame, df in this case, is passed.

Multiple Conditions

row_filters = [lambda x: x['col_1'] > 0,
               lambda x: x['col_1'] == x['col_8']]

df_2 = df\
    .loc[lambda x: row_filters[0](x) &
                   row_filters[1](x)]

Here is how this works:

  • loc[] accepts a lambda function (a callable) to which the data frame, df in this case, is passed.
  • We use that “wrapper” lambda function to call each of the lambda functions in row_filters and apply boolean logic to combine the results (See Multiple Filters).
  • The rest of the code works similarly to the “One Condition” scenario above.

Alternatively:

row_filters = [lambda x: x['col_1'] > 5,
               lambda x: x['col_3'] == x['col_6']]

lst = [row_filter(df) for row_filter in row_filters]

df_2 = df\
    .loc[pd.DataFrame(lst).all()]

Here is how this works:

  • We use a list comprehension to execute each of the lambda functions in row_filters on the data frame df. The output is a list of logical Series each the output of applying one lambda function to the data frame df.
  • In pd.DataFrame(lst), we convert the list of Series to a data frame.
  • We then use the data frame method all() to reduce the data frame of logical values to a Series with one logical value per row that is True for a row if all column values are True.

String Expression

We wish to specify the logical condition(s) to use for filtering the rows of a data frame as a string variable

One Condition

row_filter = "col_1 > 0"

df_2 = df.query(filter)

Here is how this works:

  • The query() function accepts a condition as a string and returns rows for which the condition is True.
  • In the string condition we pass to query(), we can refer to column names directly without indexing them through the data frame and without extra quotes e.g. "col_1 > 0".

Multiple Conditions

vals = ['a', 'c']
row_filters = ("col_1 in @vals "
               "and col_2 == 'yes' "
               "and col_3 <= 10 "
               "and col_4")

df_2 = df.query(row_filters)

Here is how this works:

  • The query() function accepts a complex condition (a logical combination of multiple conditions) as a string and returns rows for which the condition is True.
  • In the string condition we pass to query(), we can refer to column names directly without indexing them through the data frame and without extra quotes e.g. "col_1 > 0".
  • query() is quite flexible. In this example, we included a few examples of the expressions we can use as follows:
    • Numerical comparison operations such as “col_3 <= 10".
    • String comparison operations such as “col_2 == 'yes'" . We wrap the string we are comparing with in additional quotes.
    • Logical columns (i.e. columns with True or False values) e.g. col_4
    • We can reference environment varaibles using the @ symbol e.g. "col_1 in @vals".
PYTHON
I/O