Filtering

This section is concerned with extracting a subset of rows by column values which we will refer to going forward as * filtering*.

The output of the filtering operation is a subset of the rows of the input table appearing in the same order and with the columns of the original table unchanged.

The input to the filtering operation is essentially a logical expression that evaluates to True or False for each row. Rows for which the corresponding value is True are retained (included in the output).

This section is organized as follows:

  • Basic Filtering: We cover the basic yet very common filtering scenarios e.g. filter rows where a numerical column has a particular value.
  • Custom Filtering: We look at scenarios where the filtering logic is more involved than a simple comparison usually involving transformations or aggregations of the original data e.g. filter rows where a numerical column has a value that is higher than the mean value.
  • Grouped Filtering: Where we apply filters within groups e.g. for each group, filter the row where a numerical column has the highest value.
  • Implicit Filtering: The columns to which we wish to apply the filtering logic are not specified explicitly but rather by criteria they meet e.g. their name starts with a particular substring.
  • Filtering Joins: Filtering joins allow us to filter rows from a table table_1 based on the presence or absence of matches in another table table_2.
SQL
I/O