We wish to obtain the length of a vector (number of elements) or a data frame (number of rows).
This section is organized as follows:
List
We wish to obtain the number of elements in a list.
my_list = [1, 2, 3, 4, 5]
len(lst)
Here is how this works:
To get the length of a vector or a list, we use the len()
function from core Python.
Extension: Ignore NA
We wish to obtain the number of non-missing elements in a vector or a list.
my_list = [1, 2, None, 4, float("NaN")]
non_na_count = len([x for x in my_list if x is not None and not pd.isna(x)])
Here is how this works:
None
and nan
.None
or NaN
values.len()
to comptue the length of the remaining elements.Series
We wish to obtain the number of elements in a Series (which may be a column of a data frame).
df['col_1'].size
Here is how this works:
We use the attribute size
of Pandas
Series objects to obtain the number of elements in a Series
.
Extension: ignore NA
We wish to obtain the number of non-missing elements in a Series.
df['col_1'].count()
Here is how this works:
We use the method count()
of Pandas
Series objects to obtain the number of non-missing elements in a Series
.
Grouped Series
We wish to obtain the number of elements in a Series for each group.
df.groupby('col_1')['col_2'].size()
Here is how this works:
We use the method size()
of Pandas
SeriesGroupBy
objects to obtain the number of elements in a Series
.
Extension: ignore NA
We wish to obtain the number of non-missing elements in a Series for each group.
df.groupby('col_1')['col_2'].count()
Here is how this works:
We use the method count()
of Pandas
SeriesGroupBy
objects to obtain the number of non-missing elements in a Series
.
Data Frame
We wish to obtain the number of rows of a data frame.
len(df)
Here is how this works:
len()
to obtain the number of rows of a data frame.Grouped Data Frame
We wish to obtain the number of rows in each group of a grouped data frame.
df_2 = df.groupby('col_1', as_index=False).size()
Here is how this works:
We use the method size()
to obtain the number of rows in each group.
Alternative: via the Aggregation API
df_2 = df.groupby('col_1', as_index=False).agg('size')
Here is how this works:
We pass ‘size’
to agg()
to compute the number of rows in each group.