We wish to identify which values of a data frame or a vector are missing.
This section is complemented by Inspecting Missing Values where we cover: Checking whether any missing values exist, counting missing values, and extracting rows with misssing values.
We wish to know which values of a column are missing.
In this example, we wish to create a new column col_1_is_na
that has a value of True
where the corresponding value of the column col_1
is missing.
df_2 = df\
.assign(
col_1_is_na = df['col_1'].isna()
)
Here is how this works:
isna()
method of Series (and Data Frame) objects.isna()
will be a vector of the same length as the input column, which here is col_1
, where an element is True
if the corresponding element of the input column is NA
.isnull()
which does exactly the same thing as isna()
.Extension: Not Missing
df_2 = df\
.assign(
col_1_is_na = df['col_1'].notna()
)
Here is how this works:
~
to negate isna()
to identify values that are not missing, an alternative is to use the convenient Pandas method notna()
.notna()
determines if each individual value of the data frame or series is not missing; i.e. it returns True
if not missing.notnull()
that does the exact same thing as notna()
.We wish to know which values of a data frame are missing.
df.isna()
Here is how this works:
isna()
determines if each individual value of the data frame is missing.isna()
is a data frame of logical values of the same dimensions as the input data frame, which here is df
, where an element is True
if the corresponding element of the input data frame is NA
.We wish to determine which rows of a data frame have a missing value.
df_2 = df\
.assign(
is_incomplete = df.isna().any(axis=1)
)
Here is how this works:
any()
with axis=1
to return True
for rows where any value is missing and False
otherwise.Extension: Selected Columns
We wish to determine which rows of a data frame have a missing value for any of a selected set of columns.
df_2 = df\
.assign(
is_incomplete = df.loc[:, ['col_2', 'col_3']].isna().any(axis=1)
)
Here is how this works:
This work similarly to the solution above except that we first select the columns that we wish to check for missing values via loc[:, ['col_2', 'col_3']]
before applying isna()
and any()
.
We wish to know which values of a row are missing.
In this example, we wish to know which columns are missing in the first row of the data frame df
.
df.iloc[1].isna()
Here is how this works:
iloc[]
to pick the row that we wish to check for missing values.isna()
to identify which columns have a missing value for the selected row.input_list = [1, 2, None, 4, 5, np.nan, "", "NA"]
missing_values = [None, np.nan, "", "NA"]
is_missing = [True if el in missing_values else False for el in input_list]
Here is how this works:
missing_values
, we specify the encodings of missing values that we wish to drop from the list.input_list
is missing; i.e. is one of the values specified in missing_values
.