We wish to specify the entire logical condition that we wish to use to filter the rows of a data frame dynamically (as opposed to specifying the columns and functions separately). This may be practical in cases such as when we wish to apply a condition to many data frames of different names but same column names
In this section, we cover the following dynamic condition specification scenarios:
lambda x: x[’col_1’] > 0
.‘col_1 > 0’
.We wish to specify the logical condition(s) to use for filtering the rows of a data frame as a lambda function .
One Condition
row_filter = lambda x: x['col_1'] > 0
df_2 = df.loc[row_filter]
Here is how this works:
lambda
function lambda x: x['col_1'] > 0
, expects a data frame, referred to by x
, that has a column named 'col_1'
and returns a Series
of logical values that is True
for rows where the value of col_1'
is greater than 0.loc[]
accepts a lambda
function (a callable) to which the data frame, df
in this case, is passed.Multiple Conditions
row_filters = [lambda x: x['col_1'] > 0,
lambda x: x['col_1'] == x['col_8']]
df_2 = df\
.loc[lambda x: row_filters[0](x) &
row_filters[1](x)]
Here is how this works:
loc[]
accepts a lambda
function (a callable) to which the data frame, df
in this case, is passed.lambda
function to call each of the lambda functions in row_filters
and apply boolean logic to combine the results (See Multiple Filters).Alternatively:
row_filters = [lambda x: x['col_1'] > 5,
lambda x: x['col_3'] == x['col_6']]
lst = [row_filter(df) for row_filter in row_filters]
df_2 = df\
.loc[pd.DataFrame(lst).all()]
Here is how this works:
lambda
functions in row_filters
on the data frame df
. The output is a list of logical Series
each the output of applying one lambda
function to the data frame df
.pd.DataFrame(lst)
, we convert the list of Series
to a data frame.all()
to reduce the data frame of logical values to a Series
with one logical value per row that is True
for a row if all column values are True
.We wish to specify the logical condition(s) to use for filtering the rows of a data frame as a string variable
One Condition
row_filter = "col_1 > 0"
df_2 = df.query(filter)
Here is how this works:
query()
function accepts a condition as a string and returns rows for which the condition is True
.query()
, we can refer to column names directly without indexing them through the data frame and without extra quotes e.g. "col_1 > 0"
.Multiple Conditions
vals = ['a', 'c']
row_filters = ("col_1 in @vals "
"and col_2 == 'yes' "
"and col_3 <= 10 "
"and col_4")
df_2 = df.query(row_filters)
Here is how this works:
query()
function accepts a complex condition (a logical combination of multiple conditions) as a string and returns rows for which the condition is True
.query()
, we can refer to column names directly without indexing them through the data frame and without extra quotes e.g. "col_1 > 0"
.query()
is quite flexible. In this example, we included a few examples of the expressions we can use as follows:“col_3 <= 10"
.“col_2 == 'yes'"
. We wrap the string we are comparing with in additional quotes.True
or False
values) e.g. col_4
@
symbol e.g. "col_1 in @vals"
.