Dynamic Condition Specification

We wish to specify the entire logical condition that we wish to use to filter the rows of a data frame dynamically (as opposed to specifying the columns and functions separately). This may be practical in cases such as when we wish to apply a condition to many data frames of different names but same column names

In this section, we cover the following dynamic condition specification scenarios:

Lambda Function where we cover how to specify the logical condition(s) to use for filtering the rows of a data frame as a lambda function e.g. lambda x: x[’col_1’] > 0.
String Expression where we cover how to specify the logical condition(s) to use for filtering the rows of a data frame as a string variable e.g. ‘col_1 > 0’.

Lambda Function

We wish to specify the logical condition(s) to use for filtering the rows of a data frame as a lambda function .

One Condition

row_filter = lambda x: x['col_1'] > 0

df_2 = df.loc[row_filter]

Here is how this works:

We specify the logical condition that we wish to apply dynamically as a lambda function.
The lambda function lambda x: x['col_1'] > 0, expects a data frame, referred to by x, that has a column named 'col_1' and returns a Series of logical values that is True for rows where the value of col_1' is greater than 0.
loc[] accepts a lambda function (a callable) to which the data frame, df in this case, is passed.

Multiple Conditions

row_filters = [lambda x: x['col_1'] > 0,
               lambda x: x['col_1'] == x['col_8']]

df_2 = df\
    .loc[lambda x: row_filters[0](x) &
                   row_filters[1](x)]

Here is how this works:

loc[] accepts a lambda function (a callable) to which the data frame, df in this case, is passed.
We use that “wrapper” lambda function to call each of the lambda functions in row_filters and apply boolean logic to combine the results (See Multiple Filters).
The rest of the code works similarly to the “One Condition” scenario above.

Alternatively:

row_filters = [lambda x: x['col_1'] > 5,
               lambda x: x['col_3'] == x['col_6']]

lst = [row_filter(df) for row_filter in row_filters]

df_2 = df\
    .loc[pd.DataFrame(lst).all()]

Here is how this works:

We use a list comprehension to execute each of the lambda functions in row_filters on the data frame df. The output is a list of logical Series each the output of applying one lambda function to the data frame df.
In pd.DataFrame(lst), we convert the list of Series to a data frame.
We then use the data frame method all() to reduce the data frame of logical values to a Series with one logical value per row that is True for a row if all column values are True.

String Expression

We wish to specify the logical condition(s) to use for filtering the rows of a data frame as a string variable

One Condition

row_filter = "col_1 > 0"

df_2 = df.query(filter)

Here is how this works:

The query() function accepts a condition as a string and returns rows for which the condition is True.
In the string condition we pass to query(), we can refer to column names directly without indexing them through the data frame and without extra quotes e.g. "col_1 > 0".

Multiple Conditions

vals = ['a', 'c']
row_filters = ("col_1 in @vals "
               "and col_2 == 'yes' "
               "and col_3 <= 10 "
               "and col_4")

df_2 = df.query(row_filters)

Here is how this works:

The query() function accepts a complex condition (a logical combination of multiple conditions) as a string and returns rows for which the condition is True.
In the string condition we pass to query(), we can refer to column names directly without indexing them through the data frame and without extra quotes e.g. "col_1 > 0".
query() is quite flexible. In this example, we included a few examples of the expressions we can use as follows:
- Numerical comparison operations such as “col_3 <= 10".
- String comparison operations such as “col_2 == 'yes'" . We wrap the string we are comparing with in additional quotes.
- Logical columns (i.e. columns with True or False values) e.g. col_4
- We can reference environment varaibles using the @ symbol e.g. "col_1 in @vals".

Optima.io Reference beta

Dynamic Condition Specification

Lambda Function

String Expression