We wish to sort an ordinal categorical variable in a domain accurate manner not in alphanumeric order. An ordinal categorical variable is one where there is a natural order e.g. responses in a survey.
In this example, we have a data frame df
that has a column col_1
which is categorical variable holding t-shirt sizes XS, S, M, L, XL. We wish to sort the rows of the data frame df
in the natural order of the t-shirt sizes i.e. XS < S < M < L < XL.
sizes = c('XS', 'S', 'M', 'L', 'XL')
df_2 = df %>%
mutate(
col_1 = factor(col_1,
levels = sizes,
ordered=TRUE)) %>%
arrange(col_1)
Here is how this works:
factor
with the levels correctly defined, we can simply apply arrange()
and the data frame will be sorted according to the defined order of the factor
variable.factor()
to which we pass the column to be converted to factor (which here is col_1
), the levels of the factor (which here in the vector sizes
), and a parameter ordered
that determines if the factor is ordered (ordinal) if we set ordered=TRUE
or unordered (nominal) if we set ordered=FALSE
(which here is set to ordered=TRUE
). See Factor Operations for more details.Alternative: Value Mapping
sort_func <- function(col) {
col_b = case_when(
col == 'XS' ~ 1,
col == 'S' ~ 2,
col == 'M' ~ 3,
col == 'L' ~ 4,
col == 'XL' ~ 5,
TRUE ~ 6
)
return(col_b)
}
df_2 = df %>%
arrange(sort_func(col_1))
Here is how this works:
sort_func()
.arrange()
and pass to it the sorting column which is here col_1
.case_when()
to map values of the column col_1
to an integer that defines their sorting order. See General Operations for a coverage of conditional statements in R.