Data Manipulation

Data Manipulation encompasses a set of operations to alter the structure or values of a dataset often to make it more appropriate for the purpose at hand, for instance; data analysis.

More formally, we can define data manipulation as: Given a purpose and data that is in a certain form A, data manipulation is what we do to convert data from its original form A to a different form B that is more suited to the purpose.

We organize our coverage of data manipulation into Expressions and data type specific ** Operations**. We think of the expressions as the grammar and of the data type specific operations as the vocabulary we use with that grammar to construct useful sentences.

The Expressions sections are as follows:

  1. Inspecting the structure and contents of a table.
  2. Selecting a subset of columns from a table.
  3. Filtering a subset of rows from a table by column values.
  4. Sorting the rows of a table by column values.
  5. Transforming data by creating new columns in a table, typically as a function of one or more existing columns
  6. Aggregating data by applying summary functions, such as sum or mean, to columns of a table to create a new summary table.
  7. Reshaping the structure of data which includes single table operations such as pivoting and multiple data frame operations such as joining.
  8. Renaming the columns of a table.
  9. Relocating (changing the order of) the columns of a table.

The Operations sections are as follows:

  1. General Operations: General operations applicable to all data types.
  2. Numeric Operations: Data manipulation operations for numeric data including arithmetic and descriptive statistics.
  3. String Operations: Data manipulation operations for string data including string matching and regular expressions.
  4. Logical Operations: Data manipulation operations for logical data including boolean logic.
  5. Factor Operations: Data manipulation operations for factor (categorical) data including defining order and factor recoding.
  6. Data Time Operations: Data manipulation operations for date-time data including extracting components and comparison.
  7. List Operations: Data manipulation operations for list data including checking membership in a list.
SQL
I/O