Data Frame Segments

A common habit of successful data scientists is to look thoroughly and thoughtfully at the actual data at the onset of a project and also routinely after each data manipulation step. Datasets are often large and therefore overwhelming to inspect visually though. Therefore, Pandas offers powerful methods to subset the data to make it small enough and to zoom in on subsets of interest so we may visually inspect the actual data effectively.

In this section, we will cover the following row selection scenarios:

  • In Head or Tail we cover subsetting the top n or bottom n rows of a dataset (commonly referred to as head and tail respectively).
  • In Random we cover subsetting a random sample of rows from a data frame.
  • In Range we cover subsetting a range of rows (commonly referred to as slicing).
  • In Specific Rows we cover subsetting an arbitrary set of 1 or more rows by their position (row number).
  • In Filter we cover subsetting rows by a condition on column values (commonly referred to as filtering).
  • In Extremes we cover subsetting the rows with the largest or smallest values of a column.
PYTHON
I/O