Length

We wish to obtain the length of a vector (number of elements) or a data frame (number of rows).

This section is organized as follows:

  • Element Count is concerned with one-dimensional structures such as a list or a Series (a data frame column). We cover the following scenarios:
    • List: The number of elements in a list.
    • Series: The number of elements in a Series (which may be a data frame column).
    • Grouped Series: The number of elements in each group of a grouped series.
  • Row Count is concerned with data frames. We cover the following scenarios:
    • Data Frame: The number of rows in a data frame.
    • Grouped Data Frame: The number of rows in each group of a grouped data frame.

Element Count

List

We wish to obtain the number of elements in a list.

my_list = [1, 2, 3, 4, 5]
len(lst)

Here is how this works:

To get the length of a vector or a list, we use the len() function from core Python.

Extension: Ignore NA

We wish to obtain the number of non-missing elements in a vector or a list.

my_list = [1, 2, None, 4, float("NaN")]
non_na_count = len([x for x in my_list if x is not None and not pd.isna(x)])

Here is how this works:

  • Missing values may take different forms; most commonly None and nan.
  • We use a list comprehension to filter out the None or NaN values.
  • We then use len() to comptue the length of the remaining elements.

Series

We wish to obtain the number of elements in a Series (which may be a column of a data frame).

df['col_1'].size

Here is how this works:

We use the attribute size of Pandas Series objects to obtain the number of elements in a Series.

Extension: ignore NA

We wish to obtain the number of non-missing elements in a Series.

df['col_1'].count()

Here is how this works:

We use the method count() of Pandas Series objects to obtain the number of non-missing elements in a Series.

Grouped Series

We wish to obtain the number of elements in a Series for each group.

df.groupby('col_1')['col_2'].size()

Here is how this works:

We use the method size() of Pandas SeriesGroupBy objects to obtain the number of elements in a Series.

Extension: ignore NA

We wish to obtain the number of non-missing elements in a Series for each group.

df.groupby('col_1')['col_2'].count()

Here is how this works:

We use the method count() of Pandas SeriesGroupBy objects to obtain the number of non-missing elements in a Series.

Row Count

Data Frame

We wish to obtain the number of rows of a data frame.

len(df)

Here is how this works:

  • We use the function len() to obtain the number of rows of a data frame.
  • See Inspecting Structure for more on computing the number of rows in an ungrouped data frame.

Grouped Data Frame

We wish to obtain the number of rows in each group of a grouped data frame.

df_2 = df.groupby('col_1', as_index=False).size()

Here is how this works:

We use the method size() to obtain the number of rows in each group.

Alternative: via the Aggregation API

df_2 = df.groupby('col_1', as_index=False).agg('size')

Here is how this works:

We pass ‘size’ to agg() to compute the number of rows in each group.

PYTHON
I/O