Data Quality

Real data is rarely well behaved and often needs to be cleaned to get to a state where it can be used. It is important, therefore, to inspect the data thoroughly to identify data quality issues.

In this section we cover the two most common data quality patterns:

  • Missing Values: Are there missing fields? How much of the data is missing? Which data is missing?
  • Duplication: Are there duplicated rows?
R
I/O