Number

We wish to format the string representation of numbers in a certain way typically for display purposes.

We will cover two common scenarios which are:

  • Scientific Notation: Specify whether scientific notation should be used in the string representation of a number.
  • Comma Separator: Add commas to separate each three digits in the string representation of large numbers.
  • Decimal Places: Specify the number of decimal places to have in the string representation of a floating point number.

This section is concerned with the string representation of numbers. See Numeric for a coverage of the display settings of numbers.

Scientific Notation

We wish to make sure that numbers are displayed “plainly” without applying scientific notation.

In this example, we wish to create a new column col_2 that is a string representation of the numeric column col_1 while ensuring that the numbers are displayed without scientific notation.

df_2 = df %>%
  mutate(col_2 = format(col_1, scientific=FALSE))

Here is how this works:

  • Scientific notation is representing 1000 as 1e+03 and is often triggered automatically when displaying large numbers.
  • To obtain a string representation of a number without scientific notation, we can use the function format() while setting the argument scientific to scientific=FALSE.
  • Note that a solution such as paste(col_1, "") would not work. We may still get scientific notation returned.

Comma Separator

We wish to add comma separators between each three digits in the string representation of large numbers.

In this example, we wish to create a new column col_2 that is a string representation of the numeric column col_1 with commas added between every three digits.

df_2 = df %>%
  mutate(col_2 = format(col_1, 
                        big.mark=',', 
                        scientific=FALSE))

Here is how this works:

  • It is common to add commas after every three digits of large numbers to enhance readability e.g. 1000000 is displayed as 1,000,000.
  • To obtain a string representation of a number where every three digits are separated by a comma, we can use the function format() while setting the argument big.mark to big.mark=','.
  • Most often when working with large numbers we would need to set scientific=FALSE to prevent R from using scientific notation.

Extension: No Padding

df_2 = df %>%
  mutate(col_2 = format(col_1, 
                        big.mark=',', 
                        scientific=FALSE,
                        trim=TRUE))

Here is how this works:

  • The output in the primary solution above will have a padding of white space characters added to the left to match the width of the largest number so that the numbers are aligned to the right. This is often desirable when displaying numbers.
  • To prevent the addition of any padding, set the argument trim=TRUE when calling format().

Decimal Places

We wish to specify the number of decimal places (after the decimal point) to display in a string representation of a floating point number.

In this example, we wish to create a new column col_2 that is a string representation of the numeric column col_1 with two decimal places after the point for each value.

df_2 = df %>%
  mutate(col_2 = format(round(col_1, 2), nsmall=2))

Here is how this works:

  • In round(col_1, 2), we use the function round() to round each number in the column col_1 to two decimal places.
  • We then use format() to convert the rounded numbers to string.
  • The argument nsmall specifies the minimum number of digits to the right of the decimal point.

Extension: String to Numeric to String

df_2 = df %>%
  mutate(col_2 = col_1 %>% 
           parse_number() %>% 
           round(2) %>% 
           format(nsmall=2))

Here is how this works:

  • In case the input col_1 is a string, we will first need to convert it to a numeric data type, so we may round it.
  • We use parse_number() from the package readr (part of the tidyverse) to convert a string to a numeric data type. See Numeric.
  • We then apply round and format as covered in the primary solution above.
  • Note that here we pipe the column col_1 via %>% from one function to the next. We think that this is easier to read and understand than three nested function calls.
R
I/O