Length

We wish to obtain the length of a vector (number of elements) or a data frame (number of rows).

This section is organized as follows:

  • Element Count is concerned with one-dimensional structures such as a list or a data frame column. We cover the following scenarios:
    • Vector: The number of elements in a data frame column, a vector, or a list.
    • Grouped Column: The number of elements of a column in each group.
  • Row Count is concerned with data frames. We cover the following scenarios:
    • Data Frame: The number of rows in a data frame.
    • Grouped Data Frame: The number of rows in each group of a grouped data frame.

Element Count

Vector

We wish to obtain the number of elements in a vector or a list.

my_vec = c(1, 2, 3, 4, NA)
vec_length = length(my_vec)

Here is how this works:

  • To get the length of a vector or a list, we use the length() function from base R.
  • We can apply length() to a data frame column in the same way e.g. length(df$col_1).

Extension: Ignore NA

We wish to obtain the number of non NA elements in a vector or a list.

my_vec = c(1, 2, 3, 4, NA)
vec_length = sum(!is.na(my_vec))

Here is how this works:

  • We use the function is.na() from base R to identify which elements of a vector are missing. See Missing Values.
  • The output of is.na() is a logical vector of the same length as vec where a value is TRUE if the corresponding element of vec is NA and false otherwise.
  • Since we wish to count the number of non-NA values, we obtain the complement of the output of is.na() via the complement operator !.
  • Finally, we sum the logical values to obtain the number of non-NA values via sum(). See Working with Logical Data.

Grouped Column

We wish to obtain the number of elements in a column for each group.

In this example, we wish to obtain the number of values of the column col_2 for each group, where the groups are defined by the values of the column col_1.

df_2 = df %>%
  group_by(col_1) %>% 
  summarize(count = length(col_2))

Here is how this works:

  • The data frame df is grouped by the column col_1. The operations carried out inside the subsequent call to summarize() are executed on each group. See Aggregating.
  • We use length() to obtain the number of values of the column col_2 for each group.

Extension: Ignore NA

df_2 = df %>%
  group_by(col_1) %>% 
  summarize(count = sum(!is.na(col_2)))

Here is how this works:

We use the expression sum(!is.na(col_2)) to compute the number of non NA values of the column col_2 in each group. See “Extension: Ignore NA” under Vector above for a description.

Row Count

Data Frame

We wish to obtain the number of rows of a data frame.

df %>% nrow()

Here is how this works:

We use the function nrow() from base R to obtain the number of rows in a data frame.

Extension: Add Row Count Column

We wish to add a row to a data frame that has a constant value equal to the number of rows in the data frame.

df_2 = df %>%
  mutate(count = n())

Here is how this works:

We use n() from dplyr to obtain the number of rows in the data frame.

Grouped Data Frame

We wish to obtain the number of rows in each group of a grouped data frame.

In this example, we wish to obtain the number of rows in each group, where the groups are defined by the values of the column col_1.

df_2 = df %>%
  group_by(col_1) %>%
  summarise(count = n())

Here is how this works:

  • The data frame df is grouped by the column col_1. The operations carried out inside the subsequent call to summarize() are executed on each group. See Aggregating.
  • We use n() from dplyr to obtain the number of rows in the current group.
R
I/O