Data Frame Segments

A common habit of successful data scientists is to look thoroughly and thoughtfully at the actual data at the onset of a project and also routinely after each data manipulation step. Datasets are often large and therefore overwhelming to inspect visually though. Therefore, R offers powerful methods to subset the data to make it small enough and to zoom in on subsets of interest so we may visually inspect the actual data effectively.

In this section, we will cover the following row selection scenarios:

  • In Head or Tail we cover subsetting the top n or bottom n rows of a dataset (commonly referred to as head and tail respectively).
  • In Random we cover subsetting a random sample of rows from a data frame.
  • In Range we cover subsetting a range of rows (commonly referred to as slicing).
  • In Specific Rows we cover subsetting an arbitrary set of 1 or more rows by their position (row number).
  • In Filter we cover subsetting rows by a condition on column values (commonly referred to as filtering).
  • In Extremes we cover subsetting the rows with the largest or smallest values of a column.

For each scenario, we will generally cover three variants:

  1. All Columns: The default where we return all columns.
  2. Selected Columns: For wide datasets it can be overwhelming to try to look at all columns at once. In such a situation, we typically select only a subset of relevant columns. We will cover how to subset columns by column names.
  3. Single Value: Some times we wish to return a single value from a data frame. This is often the case when we wish to answer a look up style question.
R
I/O