In this section we cover the basic yet most common scenarios of filtering data frame rows by column values.

We wish to filter by a logical column to return the rows where the logical column is `True`

.

In this example, we wish to filter the rows of the data frame df where the logical column `‘col_3'`

has a value of `True`

.

```
df_2 = df.loc[df['col_3']]
```

Here is how this works:

- A column of a logical data type (comprised of
`True`

and / or`False`

values) can be passed directly to`loc[]`

which would return the rows where the passed column (in this example:`col_3`

) takes a value of`True`

. - Although there are several way to filter the rows of a data frame by a logical condition on column values,
`loc[]`

is the preferred approach in Pandas.

We wish to obtain (filter) the rows of a data frame where the value of a particular column meets a condition that involves comparison to a scaler value.

In this example, we wish to obtain the rows of the data frame `df`

where the numerical column `‘col_1’`

takes a value greater than 0.

```
df_2 = df.loc[df['col_1'] > 0]
```

Here is how this works:

- The expression
`df['col_1'] > 0`

yields a logical`Series`

where all values of`col_1`

that are larger than 0 are`True`

and all values that are smaller than 0 are`False`

. - The output logical
`Series`

of evaluating`df['col_1'] > 0`

is passed to the first argument of`loc[]`

to subset (extract) the corresponding rows. - In this example we used a greater than comparison
`>`

. In general we can use any comparison operation on numerical columns including smaller than`<`

, equal`==`

, greater than or equal`>=`

, or smaller than or equal`<=`

. See Numerical Operations for more details. - In addition to applying comparison operations to numerical columns, we can apply the appropriate comparison operations to non numeric columns i.e. String, Logical, Factor, and Date-Time. We cover those in their respective operations section.

We wish to return rows where a comparison of the DataFrame’s column evaluates to `True`

.

In this example, we wish to return rows where the value of the column `col_1`

is larger than the value of another column `col_2`

.

```
df_2 = df.loc[df['col_1'] > df['col_2']]
```

Here is how this works:

- For each row, the value of
`col_1`

is compared to the value of`col_2`

. If the value of`col_1`

for a row is larger than the value of`col_2`

, the expression evaluates to`True`

otherwise to`False`

. - The outcome of evaluating
`df['col_1'] > df['col_2']`

is a`Series`

of boolean`True`

or`False`

values. - That logical Series is passed to
`loc[]`

which returns the subset of rows of a data frame (in this example:`df`

) corresponding to where the boolean Series is`True`

. In this example, that would be rows where the value of`col_1`

is larger than the value of`col_2`

.

We wish to obtain rows where a condition is not satisfied.

In this example, we wish to obtain rows where the value of a string column `‘col_2'`

is neither `‘a’`

nor `‘c’`

.

**No Missing Values**

```
df_2 = df.loc[~ df['col_2'].isin(['a','c'])]
```

Here is how this works:

- We use the
`isin()`

operator to check whether each value of`‘col_2'`

is either`‘a’`

or`‘c’`

. The output is a`Series`

of logical values that is`True`

if the value of the corresponding element of`‘col_2’`

is a member of the list and`False`

otherwise. We cover checking for membership in a list in detail in List Operations. - We use the complement operator
`!`

to invert the logical values resulting from`isin()`

so that we have a value of`True`

for rows where the value of`‘col_2'`

is neither`‘a’`

nor`‘c’`

- That logical Series is passed to
`loc[]`

which returns the subset of rows of a data frame`df`

corresponding to where the boolean Series is`True`

.

**Missing Values**

```
df_2 = (df
.dropna(subset='col_2')
.loc[~ df['col_2'].isin(['a','c'])])
```

Here is how this works:

- In Python, comparing to a missing value (
`np.nan`

) returns`False`

. We would not know whether a`False`

is due to the value not meeting a condition or because it is missing. - This is not a concern when we wish to return the rows where the logical expression evaluates to
`True`

. When we wish to take the complement, it becomes an issue though. - If we do not want rows with missing values for the columns used in the filtering expression, we should drop the missing values via
`dropna()`

before comparing. We cover working with missing values in Missing Values.

If we are filtering rows as part of a chain of data manipulation operations and need to refer to a row that was created earlier in the same chain, we need to refer to it via a `lambda`

function.

```
df_2 = (df
.assign(col_2 = 2 * df['col_1'])
.loc[lambda x: x['col_2'] > 0])
```

Here is how this works:

- We can’t refer to columns that we were just created in the same chain through the original data frame. We would get an error if we try to run:
`df.assign(col_2 = 2 * df['col_1']).loc[df['col_2'] > 0]`

- To refer to a row that was created earlier in the same chain, we need to refer to it via a
`lambda`

function.

The rows we get as output from a row filtering operation, like in the scenarios above, will have the same indices as they had in the original data frame resulting in a “patchy” index. While this is appropriate in some situations, often times we wish that the data frame that results from filtering has a fresh index.

```
df_2 = (df
.loc[df['col_1'] > 0]
.reset_index(drop=True))
```

Here is how this works:

- We use the function
`reset_index()`

after`loc[]`

to give the new data frame`df_2`

a fresh index that starts at 0 and increments by 1 for each row. - If we do not set the drop parameter as
`True`

, the original index is kept in the data frame as a new column.

PYTHON