We wish to identify which values of a data frame or a vector are missing.
This section is complemented by Inspecting Missing Values where we cover: Checking whether any missing values exist, counting missing values, and extracting rows with missing values.
We wish to know which values of a column are missing.
In this example, we wish to create a new column col_1_is_na
that has a value of TRUE
where the corresponding value of the column col_1
is missing.
df_2 = df %>%
mutate(col_1_is_na = is.na(col_1))
Here is how this works:
is.na()
function from base R.dplyr
verbs (filter()
, mutate()
, etc…), we can pass the name of the column whose elements we want to check for NA
to is.na()
.is.na()
will be a vector of the same length as the input column, which here is col_1
, where an element is TRUE
if the corresponding element of the input column is NA
.We wish to know which values of a data frame are missing.
df %>% is.na()
Here is how this works:
is.na()
.is.na()
is a matrix of logical values of the same dimensions as the input data frame, which here is df
, where an element is TRUE
if the corresponding element of the input data frame is NA
.is.na()
from a matrix to a data frame which we can do via df %>% is.na() %>% as_tibble()
.We wish to determine which rows of a data frame have a missing value.
df_2 = df %>%
mutate(
is_incomplete = !complete.cases(.)
)
Here is how this works:
complete.cases()
from base R to identify whether a row of a data frame has any missing values.complete.cases()
is a vector of the same length as the number of rows and where a value is True
if the corresponding row has no missing values.complete.cases()
on the piped data frame in a chain, we refer to it via the dot operator .
.Extension: Selected Columns
We wish to determine which rows of a data frame have a missing value for any of a selected set of columns.
df_2 = df %>%
mutate(
is_incomplete = !complete.cases(pick(col_2, col_3))
)
Here is how this works:
dplyr
helper pick()
to obtain a data frame that contains a subset of the columns of the data frame being piped in the chain, which in this case is a data frame containing the columns col_2
and col_3
.complete.cases()
is then executed on the sub-data-frame to return True
for rows where neither of the selected columns has a missing value.We wish to know which values of a row of a data frame are missing.
In this example, we wish to know which columns are missing in the first row of the data frame df
.
is.na(df)[1,]
Here is how this works:
is.na()
to the data frame df
and then use [1,]
to extract the first row of the matrix.df %>% is.na() %>%
[(1,)
.