We wish to identify the columns on each of which we will apply filtering logic.
We will cover the following scenarios
This section is complemented by
We wish to apply a logical expression to every column and to return any row for which any column satisfies that logical expression.
In this example, we wish to return any row in the data frame df
for which any column has a missing value NA
.
df.loc[df.apply(pd.isna).any(axis=1)]
Here is how this works:
df.apply(pd.isna)
we go over each value of the data frame df
and check if it is missing. The output is a data frame of the same size as df
and where a value is True
if the corresponding value in the data frame df
is missing (see Missing Values).any()
with axis=1
to logically combine the values for each row of the data frame of logical True
or False
values produced by apply(pd.isna)
. The output is True
for a row if the filtering condition evaluates to True
for any column.any()
is all()
which requires that the logical expression evaluates to TRUE
for all columns. See Relationship Selection..reset_index(drop=True)
. See Patchy Index under Common Scenarios.lambda
function. See New Columns under Common Scenarios.Alternatively:
df.loc[df.isna().any(axis=1)]
Here is how this works:
isna()
in this example, we can apply it directly without the need for apply()
. We do so here via df.isna()
.any()
with axis=1
to identify rows where the condition evaluates to True
for any of the columns.We wish to apply a logical expression to a set of explicitly specified column and to return any row for which any of those columns satisfies the logical expression.
In this example, we wish to return any row in the data frame df
for which any of the columns ‘col_1'
, ‘col_3’
, or ‘col_5'
has a missing value NA
.
df_2 = df.loc[(df[['col_1', 'col_3', 'col_5']]
.apply(pd.isna)
.any(axis=1))]
Here is how this works:
df[['col_1', 'col_3', 'col_5']]
, we select the columns of interest by name. See Basic Selection for a detailed coverage of explicit column selection scenarios.We wish to apply a logical expression to a set of implicitly specified column and to return any row for which any of those columns satisfies the logical expression. Implicit column selection is when we do not spell out the column names or positions explicitly but rather identify the columns via a property of their name or their data.
In this example, we wish to return any row in the data frame df
for which any column whose name starts with the substring ‘cvr_’
is missing.
df_2 = df.loc[(df
.loc[:, df.columns.str.contains('cvr_', regex=False)]
.apply(pd.isna)
.any(axis=1))]
Here is how this works:
df.columns.str.contains('cvr_', regex=False)
to select all columns whose name starts with the substring ‘cvr_’
. See Implicit Selection for a coverage of the most common scenarios of implicit column selection including by name pattern, data type, and Criteria satisfied by the column’s data.