In a data inspection context, we typically filter rows by column values to get a sense of the data e.g. to check for surprising values. We cover the basics of filtering here and in detail in Filtering.
We wish to extract rows that meet a certain condition on column values. In this example, we filter out rows where the column col_1
has a value of 0
.
df.loc[df['col_1'] == 0]
Here is how this works:
loc[]
to the data frame df
..loc[]
for filtering rows by a boolean condition on column values.col_1
with the integer 0
and return only the rows where the condition evaluates to TRUE
.We wish to extract rows that meet a certain condition on column values. We wish to return only a specified set of columns (and not all columns). This is useful while developing a filter to look at just the relevant columns to verify if the filtering works as expected.
In this example, we wish to return rows where the value of col_1
is greater than 0
and to return columns col_1
and col_3
.
df.loc[df['col_1'] > 0, ['col_1','col_3']]
Here is how this works:
.loc[]
is the indexer we use for both boolean filtering and for column selection by name, we can combine both operations in a single call to .loc[]
.df['col_1'] > 0
to the first argument of loc[]
and the column selection logic ['col_1','col_3']
to the second argument (see Selecting by Name).