Previous steps

If you would like to return to information from the previous session, please click here.

Context

In the integration of coral reef monitoring data from around the globe, there are challenges in collating data with different levels of taxonomic resolution, variations in replication and sampling design, and accuracy of spatial and temporal information. These variations become particularly important when collating historical data, where baseline information of the status of coral reefs from 20 or 30 years ago were gathered without the same detail of metadata.

To deal with these different levels of taxonomic, sampling designs, spatial and temporal accuracy, the GCRMN has adopted a set of data tiers which allows for data to be utilised for different purposes and underlying assumptions, but still retain detail for dedicated analyses.

For this module, we will be conducting exercises on data from different tiers (e.g. “percent cover of living hard coral”, “percent cover of Pocillopora”) and develop skills for managing categories. Although an overview of the GCRMN data quality model was presented here, this wiki page provides some background to how we will apply these criteria to format and standardise our data for analysis.

Applying the GCRMN data quality model

As previously mentioned, the GCRMN data quality model includes scores for team experience, sample design, level of evidence, and documentation, including:

  1. number of sampling units, of specified types
  2. physical grain of unit measure (ie. spacing/size issues)
  3. taxonomic identification level (how precise)
  4. experience and training of the monitoring team

For this module, we will be focusing on #1 “sampling units” and #3 “taxonomic identification”. In the Mapping & spatial representation module, we will touch back on the #2 “physical grain” criterion.

The general strategy for managing the different levels of detail is to retain information from finer levels of detail as much as possible, and then aggregate to coarser levels when necessary. In this module, we will illustrate some examples of how to consolidate the finest level of detail. For example, “olive filamentous algae” and “branching green turf algae” might be categorised to something of intermediate detail (e.g. “green algal turf”), which may be assigned to a coarser level category (e.g. “macroaclgae”).

Likewise, we will learn ways to manage information from different levels of sampling (e.g. quadrate, transect, site, locality, country) and how to aggregate and summarise for different levels (e.g. average living coral percent cover per transect which can then be used to calculate a site average).

These Exercises and Homework will bring together skills and tools learned in the previous module, incluing using tubes %>%, the packages tidyr and dplyr, as well as forcats for managing categories in data.

Next steps

To begin this journey of data wrangling, we will need to import some data to start formatting and standardising!