Selecting

In this section we cover selecting a subset of columns from a data frame.

The section is structured as follows:

  • Basic Selection where we specify the columns we wish to select explicitly and includes two scenarios:
    • Selection by Name where we identify the columns we wish to select by specifying their names and is by far the most common column selection scenario.
    • Selection by Position where we identify the columns we wish to select by specifying their position (as an integer). The positions start at 0 for the left most column.
  • Implicit Selection where rather than identifying the columns we wish to select by their name or position, we identify them by one of their properties. We will cover the three most common implicit column selection scenarios which are:
    • Selection by Name Pattern where we wish to select columns whose names satisfy a given pattern e.g. select columns with a name that contains the string ‘_id’ .
    • Selection by Data Type where we wish to select columns of one or more data types e.g. select columns with a numeric data type.
    • Selection by Data Criteria where we wish to select columns whose data satisfies a certain condition e.g. the percentage of missing values is below 10%.
  • Dynamic Selection where we specify the columns to select or the implicit column selection function dynamically i.e. through a variable or a function argument.
  • Exclude Columns where we cover how to exclude (sometimes referred to as drop) columns from a data frame; i.e. selection by exclusion (we are essentially selecting the columns that we are not excluding).
PYTHON
I/O