Previous steps

If you would like to return to information from the previous session, please click here.

Context

At the beginning of this module, we revisited the GCRMN data tier model which sets out a number of different data “quality” levels based on the taxonomic resolution of observations, et cetera. Another way to think about these classifications is through factors in R, which is a “category” or “enumerated type” which allows us to simplify or aggregate basded on those levels.

As part of this module, we would like to introduce the concept of factors in R and illustrate how to manipulate them to assist in data summaries, visualisations, et cetera. The aim of these exercises is to provide you with the basic skills to:

[The code for this example can be found here: creation_code/exercises/formatting/create_sessiles_dat.acosa.R]

Managing factors

When factors are assigned to an object in R, these are carried with the object in its “memory” or “metadata”. Because of this, learning the basics of how to identify, set and manipulate factor levels in a data object is a very useful skill.

Let’s have a look at the taxa codes and groupings that we had assigned to our object of percent cover data:

>     sessiles_dat.acosa %>%
+       dplyr::select(Codigo,
+                     Agrupacion) %>%
+       distinct()
# A tibble: 90 x 2
   Codigo Agrupacion
   <chr>  <chr>
 1 ARENA  arena
 2 TURF   turf
 3 Esp    esponja
 4 Acc    Alga calcarea costrosa
 5 Hal    macroalga
 6 Brio   otro
 7 Hid    otro
 8 lep    otro
 9 Amp    macroalga
10 gel    turf
# … with 80 more rows

We can see that the otro category are actually sessile invertebrates (i.e. bryozoans, hydroids) and depending on the reporting requirements, we may want to change the algal classifications (i.e. aggregate turf and macroalga).

One way to do this is to alter the original excel sheet to which we made the original left_join to our percent cover data. But, one argument against that is we may want to use that classification for some other purpose (or for the original intention that the categorisation was developed). However, a more elegant way is to use {forcats} to change the factor levels - and also provide documentation of how these categories were changed for our particular purpose.

Before we start, let’s get a list of the taxa groupings:

> sessiles_dat.acosa$Agrupacion %>% unique() %>% sort()
 [1] "Alga calcarea costrosa" "alga costrosa"          "arena"                  "basalto"
 [5] "cascajo"                "coral"                  "coral blanco"           "coral muerto"
 [9] "esponja"                "Hidrozoo"               "macroalga"              "Octocoral"
[13] "otro"                   "rodolito"               "turf"

The grouping otro you may remember, appears to be sessile invertebrates (e.g. hydroids). But, there is also a grouping called Hydrozoo which is Spanish for hydroid. We can also see that we have a number of crustose algae (i.e. Alga calcarea costrosa and alga costrosa) that we may want to amalgamate.

For this exercise, let’s assume the reporting requirement identifies six classes (e.g. “Live coral”, “Dead or bleached coral”, “Macroalga”, “Crustose algae”, “Sessile invertebrates”, “Non-living”)

  # modify groupings
    sessiles_dat.acosa %<>%
      mutate(Agrupacion = Agrupacion %>% factor() %>%
                            fct_recode(`Crustose algae`         = "Alga calcarea costrosa",
                                       `Crustose algae`         = "alga costrosa",
                                       `Non-living`             = "arena",
                                       `Non-living`             = "basalto",
                                       Macroalgae               = "cascajo",
                                       `Live coral`             = "coral",
                                       `Bleached or dead coral` = "coral blanco",
                                       `Bleached or dead coral` = "coral muerto",
                                       `Sessile invertebrates`  = "esponja",
                                       `Sessile invertebrates`  = "Hidrozoo",
                                       Macroalgae               = "macroalga",
                                       `Sessile invertebrates`  = "Octocoral",
                                       `Sessile invertebrates`  = "otro",
                                       `Crustose algae`         = "rodolito",
                                       Macroalgae               = "turf"))

This sets the 6 classes for our report, and we can quickly set an order to them for our data summaries and visualisation:

  # set grouping order
    grouping_order <-
      c("Live coral",
        "Bleached or dead coral",
        "Macroalgae",
        "Crustose algae",
        "Sessile invertebrates",
        "Non-living")

  # set order & summarise
    sessiles_dat.acosa %>%
      mutate(Agrupacion = Agrupacion %>% factor(levels = grouping_order)) %>%
      group_by(Agrupacion) %>%
      summarise(`Average cover` = Value %>% mean(na.rm = TRUE))
# # A tibble: 7 x 2
  # Agrupacion             `Average cover`
  # <fct>                            <dbl>
# 1 Live coral                       4.31
# 2 Bleached or dead coral           0.286
# 3 Macroalgae                      17.8
# 4 Crustose algae                   3.49
# 5 Sessile invertebrates            1.22
# 6 Non-living                      17.4
# 7 <NA>                            16.3

Looks good, but not perfect. Seems like there is quite a bit of percent cover (i.e. 16.3%) that is unattributed to our groupings! We will pick this up as part of the Homework for this module.

Next steps

We are almost ready with our basic skills for Data Formatting & Standardisation! All we need to do now is save our clean data object, so we can use it for producing figures and reporting on the status and trends of coral reefs. The next exercise takes us through the basics for standardising character strings.