If you would like to return to information from the previous session, please click here.
At the beginning of this module, we revisited the GCRMN data tier model which sets
out a number of different data “quality” levels based on the taxonomic
resolution of observations, et cetera. Another way to think
about these classifications is through factors
in R, which
is a “category” or “enumerated type” which allows us to simplify or
aggregate basded on those levels.
As part of this module, we would like to introduce the concept of
factors
in R and illustrate how to manipulate them to
assist in data summaries, visualisations, et cetera. The aim of
these exercises is to provide you with the basic skills to:
[The code for this example can be found here:
creation_code/exercises/formatting/create_sessiles_dat.acosa.R
]
When factors
are assigned to an object in R, these are
carried with the object in its “memory” or “metadata”. Because of this,
learning the basics of how to identify, set and manipulate
factor
levels in a data object is a very useful skill.
Let’s have a look at the taxa codes and groupings that we had assigned to our object of percent cover data:
> sessiles_dat.acosa %>%
+ dplyr::select(Codigo,
+ Agrupacion) %>%
+ distinct()
# A tibble: 90 x 2
Codigo Agrupacion
<chr> <chr>
1 ARENA arena
2 TURF turf
3 Esp esponja
4 Acc Alga calcarea costrosa
5 Hal macroalga
6 Brio otro
7 Hid otro
8 lep otro
9 Amp macroalga
10 gel turf
# … with 80 more rows
We can see that the otro
category are actually sessile
invertebrates (i.e. bryozoans, hydroids) and depending on the reporting
requirements, we may want to change the algal classifications
(i.e. aggregate turf
and macroalga
).
One way to do this is to alter the original excel sheet to which we
made the original left_join
to our percent cover data. But,
one argument against that is we may want to use that classification for
some other purpose (or for the original intention that the
categorisation was developed). However, a more elegant way is to use
{forcats}
to change the factor levels - and also provide
documentation of how these categories were changed for our particular
purpose.
Before we start, let’s get a list of the taxa groupings:
> sessiles_dat.acosa$Agrupacion %>% unique() %>% sort()
[1] "Alga calcarea costrosa" "alga costrosa" "arena" "basalto"
[5] "cascajo" "coral" "coral blanco" "coral muerto"
[9] "esponja" "Hidrozoo" "macroalga" "Octocoral"
[13] "otro" "rodolito" "turf"
The grouping otro
you may remember, appears to be
sessile invertebrates (e.g. hydroids). But, there is also a grouping
called Hydrozoo
which is Spanish for hydroid. We can also
see that we have a number of crustose algae
(i.e. Alga calcarea costrosa
and
alga costrosa
) that we may want to amalgamate.
For this exercise, let’s assume the reporting requirement identifies six classes (e.g. “Live coral”, “Dead or bleached coral”, “Macroalga”, “Crustose algae”, “Sessile invertebrates”, “Non-living”)
# modify groupings
sessiles_dat.acosa %<>%
mutate(Agrupacion = Agrupacion %>% factor() %>%
fct_recode(`Crustose algae` = "Alga calcarea costrosa",
`Crustose algae` = "alga costrosa",
`Non-living` = "arena",
`Non-living` = "basalto",
Macroalgae = "cascajo",
`Live coral` = "coral",
`Bleached or dead coral` = "coral blanco",
`Bleached or dead coral` = "coral muerto",
`Sessile invertebrates` = "esponja",
`Sessile invertebrates` = "Hidrozoo",
Macroalgae = "macroalga",
`Sessile invertebrates` = "Octocoral",
`Sessile invertebrates` = "otro",
`Crustose algae` = "rodolito",
Macroalgae = "turf"))
This sets the 6 classes for our report, and we can quickly set an order to them for our data summaries and visualisation:
# set grouping order
grouping_order <-
c("Live coral",
"Bleached or dead coral",
"Macroalgae",
"Crustose algae",
"Sessile invertebrates",
"Non-living")
# set order & summarise
sessiles_dat.acosa %>%
mutate(Agrupacion = Agrupacion %>% factor(levels = grouping_order)) %>%
group_by(Agrupacion) %>%
summarise(`Average cover` = Value %>% mean(na.rm = TRUE))
# # A tibble: 7 x 2
# Agrupacion `Average cover`
# <fct> <dbl>
# 1 Live coral 4.31
# 2 Bleached or dead coral 0.286
# 3 Macroalgae 17.8
# 4 Crustose algae 3.49
# 5 Sessile invertebrates 1.22
# 6 Non-living 17.4
# 7 <NA> 16.3
Looks good, but not perfect. Seems like there is quite a bit of percent cover (i.e. 16.3%) that is unattributed to our groupings! We will pick this up as part of the Homework for this module.
We are almost ready with our basic skills for Data Formatting & Standardisation! All we need to do now is save our clean data object, so we can use it for producing figures and reporting on the status and trends of coral reefs. The next exercise takes us through the basics for standardising character strings.