Filtered Rows

In a data inspection context, we typically filter rows by column values to get a sense of the data e.g. to check for surprising values. We cover the basics of filtering here and in detail in Filtering.

Filter

We wish to extract rows that meet a certain condition on column values. In this example, we filter out rows where the column col_1 has a value of 0.

df.loc[df['col_1'] == 0]

Here is how this works:

  • We apply loc[] to the data frame df.
  • In Pandas, we use .loc[] for filtering rows by a boolean condition on column values.
  • In this case, we compare the value of col_1 with the integer 0 and return only the rows where the condition evaluates to TRUE.

Selected Columns

We wish to extract rows that meet a certain condition on column values. We wish to return only a specified set of columns (and not all columns). This is useful while developing a filter to look at just the relevant columns to verify if the filtering works as expected.

In this example, we wish to return rows where the value of col_1 is greater than 0 and to return columns col_1 and col_3.

df.loc[df['col_1'] > 0, ['col_1','col_3']]

Here is how this works:

  • Since .loc[] is the indexer we use for both boolean filtering and for column selection by name, we can combine both operations in a single call to .loc[].
  • We pass the row filtering logic df['col_1'] > 0 to the first argument of loc[] and the column selection logic ['col_1','col_3'] to the second argument (see Selecting by Name).
PYTHON
I/O