We wish to obtain the length of a vector (number of elements) or a data frame (number of rows).
This section is organized as follows:
Vector
We wish to obtain the number of elements in a vector or a list.
my_vec = c(1, 2, 3, 4, NA)
vec_length = length(my_vec)
Here is how this works:
length()
function from base R.length()
to a data frame column in the same way e.g. length(df$col_1)
.Extension: Ignore NA
We wish to obtain the number of non NA
elements in a vector or a list.
my_vec = c(1, 2, 3, 4, NA)
vec_length = sum(!is.na(my_vec))
Here is how this works:
is.na()
from base R to identify which elements of a vector are missing. See Missing Values.is.na()
is a logical vector of the same length as vec
where a value is TRUE
if the corresponding element of vec is NA
and false otherwise.is.na()
via the complement operator !
.sum()
. See Working with Logical Data.Grouped Column
We wish to obtain the number of elements in a column for each group.
In this example, we wish to obtain the number of values of the column col_2
for each group, where the groups are defined by the values of the column col_1
.
df_2 = df %>%
group_by(col_1) %>%
summarize(count = length(col_2))
Here is how this works:
df
is grouped by the column col_1
. The operations carried out inside the subsequent call to summarize()
are executed on each group. See Aggregating.length()
to obtain the number of values of the column col_2
for each group.Extension: Ignore NA
df_2 = df %>%
group_by(col_1) %>%
summarize(count = sum(!is.na(col_2)))
Here is how this works:
We use the expression sum(!is.na(col_2))
to compute the number of non NA
values of the column col_2
in each group. See “Extension: Ignore NA” under Vector above for a description.
Data Frame
We wish to obtain the number of rows of a data frame.
df %>% nrow()
Here is how this works:
We use the function nrow()
from base R to obtain the number of rows in a data frame.
Extension: Add Row Count Column
We wish to add a row to a data frame that has a constant value equal to the number of rows in the data frame.
df_2 = df %>%
mutate(count = n())
Here is how this works:
We use n()
from dplyr
to obtain the number of rows in the data frame.
Grouped Data Frame
We wish to obtain the number of rows in each group of a grouped data frame.
In this example, we wish to obtain the number of rows in each group, where the groups are defined by the values of the column col_1
.
df_2 = df %>%
group_by(col_1) %>%
summarise(count = n())
Here is how this works:
df
is grouped by the column col_1
. The operations carried out inside the subsequent call to summarize()
are executed on each group. See Aggregating.n()
from dplyr
to obtain the number of rows in the current group.