If you would like to return to information from the previous session, please click here.
In the integration of coral reef monitoring data from around the globe, there are challenges in collating data with different levels of taxonomic resolution, variations in replication and sampling design, and accuracy of spatial and temporal information. These variations become particularly important when collating historical data, where baseline information of the status of coral reefs from 20 or 30 years ago were gathered without the same detail of metadata.
To deal with these different levels of taxonomic, sampling designs, spatial and temporal accuracy, the GCRMN has adopted a set of data tiers which allows for data to be utilised for different purposes and underlying assumptions, but still retain detail for dedicated analyses.
For this module, we will be conducting exercises on data from different tiers (e.g. “percent cover of living hard coral”, “percent cover of Pocillopora”) and develop skills for managing categories. Although an overview of the GCRMN data quality model was presented here, this wiki page provides some background to how we will apply these criteria to format and standardise our data for analysis.
As previously mentioned, the GCRMN data quality model includes scores for team experience, sample design, level of evidence, and documentation, including:
For this module, we will be focusing on #1 “sampling units” and #3 “taxonomic identification”. In the Mapping & spatial representation module, we will touch back on the #2 “physical grain” criterion.
The general strategy for managing the different levels of detail is to retain information from finer levels of detail as much as possible, and then aggregate to coarser levels when necessary. In this module, we will illustrate some examples of how to consolidate the finest level of detail. For example, “olive filamentous algae” and “branching green turf algae” might be categorised to something of intermediate detail (e.g. “green algal turf”), which may be assigned to a coarser level category (e.g. “macroaclgae”).
Likewise, we will learn ways to manage information from different
levels of sampling (e.g. quadrate, transect, site, locality, country)
and how to aggregate and summarise for different levels
(e.g. average living coral percent cover per transect
which
can then be used to calculate a site
average).
These Exercises and Homework will
bring together skills and tools learned in the previous module, incluing
using tubes %>%
, the packages tidyr
and
dplyr
, as well as forcats
for managing
categories in data.
To begin this journey of data wrangling, we will need to import some data to start formatting and standardising!