Identify Missing

We wish to identify which values of a data frame or a vector are missing.

This section is complemented by Inspecting Missing Values where we cover: Checking whether any missing values exist, counting missing values, and extracting rows with misssing values.

Column

We wish to know which values of a column are missing.

In this example, we wish to create a new column col_1_is_na that has a value of True where the corresponding value of the column col_1 is missing.

df_2 = df\
    .assign(
        col_1_is_na = df['col_1'].isna()
    )

Here is how this works:

  • The standard solution for identifying missing values in Pandas is the isna() method of Series (and Data Frame) objects.
  • The output of isna() will be a vector of the same length as the input column, which here is col_1, where an element is True if the corresponding element of the input column is NA.
  • Note that, for convenience, there is another function isnull() which does exactly the same thing as isna().

Extension: Not Missing

df_2 = df\
    .assign(
        col_1_is_na = df['col_1'].notna()
    )

Here is how this works:

  • While one can use the complement operator ~ to negate isna() to identify values that are not missing, an alternative is to use the convenient Pandas method notna().
  • The method notna() determines if each individual value of the data frame or series is not missing; i.e. it returns True if not missing.
  • There is an alias notnull() that does the exact same thing as notna().

Data Frame

We wish to know which values of a data frame are missing.

df.isna()

Here is how this works:

  • The method isna() determines if each individual value of the data frame is missing.
  • The output of isna() is a data frame of logical values of the same dimensions as the input data frame, which here is df, where an element is True if the corresponding element of the input data frame is NA.

Incomplete Row

We wish to determine which rows of a data frame have a missing value.

df_2 = df\
    .assign(
        is_incomplete = df.isna().any(axis=1)
    )

Here is how this works:

  • We use isna() to identify all missing values in the data frame df as described above in Data Frame.
  • We then use any() with axis=1 to return True for rows where any value is missing and False otherwise.

Extension: Selected Columns

We wish to determine which rows of a data frame have a missing value for any of a selected set of columns.

df_2 = df\
    .assign(
        is_incomplete = df.loc[:, ['col_2', 'col_3']].isna().any(axis=1)
    )

Here is how this works:

This work similarly to the solution above except that we first select the columns that we wish to check for missing values via loc[:, ['col_2', 'col_3']] before applying isna() and any().

Row

We wish to know which values of a row are missing.

In this example, we wish to know which columns are missing in the first row of the data frame df.

df.iloc[1].isna()

Here is how this works:

  • We use iloc[] to pick the row that we wish to check for missing values.
  • We then use isna() to identify which columns have a missing value for the selected row.

List

input_list = [1, 2, None, 4, 5, np.nan, "", "NA"]
missing_values = [None, np.nan, "", "NA"]
is_missing = [True if el in missing_values else False for el in input_list]

Here is how this works:

  • In missing_values, we specify the encodings of missing values that we wish to drop from the list.
  • We then use a list comprehension to determine whether each element of the list input_list is missing; i.e. is one of the values specified in missing_values.
PYTHON
I/O