Previous steps

If you would like to return to the previous section, please click here.

Context

In the process of cleaning and standardising data in R, there are cases where numerous functions are used in succession. The way in which functions in R are applied to objects means that one has to “work backwards” to understand how the steps are applied.

For example:

  # set seed for reproducibility
    set.seed(13)

  # take the mean of the square root of the absolute value of a set of random numbers
    mean(sqrt(abs(rnorm(18))))
[1] 0.8255879

Now, if we wanted to round the number to a few decimal places we get:

  # round to 3 decimal places
    round(mean(sqrt(abs(rnorm(18)))), 3)
[1] 0.826

As you can see, it requires some thinking to work inside all of the different steps to figure out what is going on, particularly where there might be additional parameters in the functions (e.g. round(digits = 3)).

Using pipes to clarify successive steps

Using pipes (i.e. %>%) from the magrittr package helps clarify each of the steps:

rnorm(18) %>% abs() %>% sqrt() %>% mean()

And, if we want to round to 3 digits, we just add that to the end:

rnorm(18) %>% abs() %>% sqrt() %>% mean() %>% round(3)

The use of pipes also becomes helpful when we are troubleshooting and making numerous steps in cleaning, filtering and standardising data. In this training course, we will learn how to modify data objects “on the fly” and pipe the results to a graphic.

For example, using our random number example from above:

  # visualise data
    rnorm(18) %>%
      abs() %>%
      sqrt() %>%
      hist()

Should produce something like this:

By commenting out the abs() or sqrt() in the sequence of steps, we can visualise the effect of the transform on the data structure very simply. However, would be much more cumbersome in the ‘fully nested’ sqrt(abs(rnorm(18))) form.

Two-way pipes

Sometimes in R we want to modify a data object and give it the same name. For example, we want to filter out the sites without names. This can be done by:

  # modify data object
    reef_data <-
      reef_data %>%
        dplyr::filter(!Site %>% is.na())

This replaces the object reef_data with an object that has filtered out sites with no names (i.e. is.na()). It has the same name reef_data.

Using a “two-way” pipe (i.e. %<>%), we can simplify this operation:

  # modify data object
    reef_data %<>%
      dplyr::filter(!Site %>% is.na())

Basically, what we are doing is piping reef_data to filter sites with no name and then piping the result back to reef_data.

Next steps

Now that we have some R basics, principles around coding and know how to pipe, we can go on to some Exercises for R coding.