Exercises using percent cover data

Previous steps

If you would like to return to information from the previous session, please click here.

Context

Creating effective viuals from percent cover data often requires summarising sampling levels (i.e. quadrats, transects, depths) and filtering key taxa from a long list of categories to simplify the graphical output.

In some cases, the creation of these “on-the-fly” summaries mneed a separate concordance file to classify benthic categories (e.g. different types of "Non-living" categories: Bare substrate, Rubble, Sand). These are then used to summarise percent cover at the quadrat level and then summarised at the transect or site level.

For practising skills for visualising status & trends, some additional examples using the CPCE data from Kenya. This wiki page outlines some approaches for reducing the complexity of the benthic data and visualisation.

Getting set up

To start, the cpce data should be loaded from our previous creation_code script output (i.e. *.rda file from the data_intermediate folder). This particular example uses Mishal’s solution for the cpce data creation:

  # point to data locale
    data_locale <- "data_intermediate/examples/formatting/"

  # call to data
    load(paste0(data_locale, "cpce_major_categories.rda"))

It is usually good practise to have a look at aspects of the data object that we have just loaded to ensure it is sound. This also provides an opportunity to clean up taxonomic codes and standardise the naming conventions (i.e. CPCE categories have a mixture of all uppercase, lower case and mixed).

  # get number of sites
    cpce_major_categories$Site %>% unique() %>% sort()
# [1] "MaKokw1"

 ## -- clean taxa names -- ##
  # separate codes
    cpce_major_categories %<>%
      separate(`Taxa category`,
               into = c("Taxa name", "Taxa code"),
               sep  = "\\(")

  # clean up
    cpce_major_categories %<>%
      mutate(`Taxa name` = `Taxa name` %>% str_trim(),
             `Taxa code` = `Taxa code` %>% str_replace("\\)", ""))

  # set taxa to title
    cpce_major_categories %<>%
      mutate(`Taxa name` = `Taxa name` %>% str_to_title())

Visualisation of taxa percent cover

Visualising the distribution of the percent cover data by taxa can help identify any potential outliers and the overall trend at a site. For this, a useful geom is geom_boxplot():

  # create plot
    cpce_major_categories %>%
      dplyr::filter(cover > 0) %>%
    ggplot(aes(`Taxa name`, cover)) +
      geom_boxplot() +
      theme_bw() +
      ylab("Percent cover") +
      xlab("") +
      theme(axis.text.x = element_text(angle = 90))

The result should look something like this:

Reducing data set complexity

Inspecting the data, we can see that there are quite a number of benthic categories which can be grouped (e.g. as Macroalgae or Non-living categories). For example:

  # get taxa names
    cpce_major_categories$`Taxa name` %>% unique()
 # [1] "Coral"               "Soft Coral"          "Inverts-Other"       "Algae-Macro"
 # [5] "Algae-Halimeda"      "Algae-Coralline"     "Algae-Turf"          "Bare Substrate"
 # [9] "Rubble"              "Sand"                "Seagrass"            "Dead Standing Coral"
# [13] "Recent Dead Coral"   "Unidentified"        "Tape, Wand, Shadow"

In previous exercises, we have seen how to use the package forcats to re-code factors within a data object. Another approach is to create a concordance file using the function tribble():

  # create concordance file
    taxa_concordances <-
      tribble(~`Taxa name`,          ~`Benthic class`,
              "Coral",               "Coral",
              "Soft Coral",          "Sessile invertebrate",
              "Inverts-Other",       "Sessile invertebrate",
              "Algae-Macro",         "Macroalgae",
              "Algae-Halimeda",      "Macroalgae",
              "Algae-Coralline",     "Crustose algae",
              "Algae-Turf",          "Macroalgae",
              "Bare Substrate",      "Non-living",
              "Rubble",              "Non-living",
              "Sand",                "Non-living",
              "Seagrass",            "Seagrass",
              "Dead Standing Coral", "Dead coral",
              "Recent Dead Coral",   "Dead coral",
              "Unidentified",        "Non-living",
              "Tape, Wand, Shadow",  "Non-living")

One advantage of creating a concordance table like this, it can be used for classifying other data objects (e.g. from other sites) or saved as an intermediate data *.rda file. The essential bit is that each method provides clear documentation on how the different benthic classes are re-classified and can be quickly modified for other reporting purposes.

For example, a seasonal monitoring report may want to focus on three key categories:

  # set groups of interest
    groups_of_interest <-
      c("Coral",
        "Macroalgae",
        "Dead coral")

Next steps

As part of the Homework for this module, participants should be creating their own visuals for benthic data or the fish data to sharpen their ggplot() skills and familiarisation with different geoms and facetting.