If you would like to return to the previous section, please click here.
In the process of cleaning and standardising data in R
,
there are cases where numerous functions are used in succession. The way
in which functions in R
are applied to objects means that
one has to “work backwards” to understand how the steps are applied.
For example:
# set seed for reproducibility
set.seed(13)
# take the mean of the square root of the absolute value of a set of random numbers
mean(sqrt(abs(rnorm(18))))
[1] 0.8255879
Now, if we wanted to round the number to a few decimal places we get:
# round to 3 decimal places
round(mean(sqrt(abs(rnorm(18)))), 3)
[1] 0.826
As you can see, it requires some thinking to work inside all of the
different steps to figure out what is going on, particularly where there
might be additional parameters in the functions
(e.g. round(digits = 3)
).
Using pipes (i.e. %>%
) from the magrittr
package helps clarify each of the steps:
rnorm(18) %>% abs() %>% sqrt() %>% mean()
And, if we want to round to 3 digits, we just add that to the end:
rnorm(18) %>% abs() %>% sqrt() %>% mean() %>% round(3)
The use of pipes also becomes helpful when we are troubleshooting and making numerous steps in cleaning, filtering and standardising data. In this training course, we will learn how to modify data objects “on the fly” and pipe the results to a graphic.
For example, using our random number example from above:
# visualise data
rnorm(18) %>%
abs() %>%
sqrt() %>%
hist()
Should produce something like this:
By commenting out the abs()
or sqrt()
in
the sequence of steps, we can visualise the effect of the transform on
the data structure very simply. However, would be much more cumbersome
in the ‘fully nested’ sqrt(abs(rnorm(18)))
form.
Sometimes in R
we want to modify a data object and give
it the same name. For example, we want to filter out the sites without
names. This can be done by:
# modify data object
reef_data <-
reef_data %>%
dplyr::filter(!Site %>% is.na())
This replaces the object reef_data
with an object that
has filtered out sites with no names (i.e. is.na()
). It has
the same name reef_data
.
Using a “two-way” pipe (i.e. %<>%
), we can
simplify this operation:
# modify data object
reef_data %<>%
dplyr::filter(!Site %>% is.na())
Basically, what we are doing is piping reef_data
to
filter sites with no name and then piping the result back to
reef_data
.
Now that we have some R
basics, principles around coding
and know how to pipe, we can go on to some Exercises for
R coding.