Data Manipulation

Data Manipulation encompasses a set of operations to alter the structure or values of a dataset often to make it more appropriate for the purpose at hand, for instance; data analysis.

More formally, we can define data manipulation as: Given a purpose and data that is in a certain form A, data manipulation is what we do to convert data from its original form A to a different form B that is more suited to the purpose.

We organize our coverage of data manipulation into Expressions and data type specific Operations. We think of the expressions as the grammar and of the data type specific operations as the vocabulary we use with that grammar to construct useful sentences.

The Expressions sections are as follows:

  1. Inspecting the structure and contents of a data frame.
  2. Selecting a subset of columns from a data frame.
  3. Filtering a subset of rows from a data frame by column values.
  4. Sorting the rows of a data frame by column values.
  5. Transforming data by creating new columns in a data frame, typically as a function of one or more existing columns
  6. Aggregating data by applying summary functions, such as sum or mean, to columns of a data frame to create a new summary data frame.
  7. Reshaping the structure of data which includes single data frame operations such as pivoting and multiple data frame operations such as joining.
  8. Renaming the columns of a data frame.
  9. Relocating (changing the order of) the columns of a data frame.

The Operations sections are as follows:

  1. General: General operations applicable to all data types.
  2. Numeric: Data manipulation operations for numeric data including arithmetic and descriptive statistics.
  3. String: Data manipulation operations for string data including string matching and regular expressions.
  4. Logical: Data manipulation operations for logical data including boolean logic.
  5. Factor: Data manipulation operations for factor (categorical) data including defining order and factor recoding.
  6. Data Time: Data manipulation operations for date-time data including extracting components and comparison.
  7. List: Data manipulation operations for list data including checking membership in a list.