If you would like to return to information from the previous session, please click here.
As part of the development of a standardised project repository for GCRMN reef monitoring data, with folders for creating data objects, visualising, reporting code, et cetera, there is also a file which outlines the basic steps of the data standardisation and analysis.
The idea behind this file (called integrate.R
) is that
it provides the documentation of each step in order to reproduce the
results of the data standardisation, visualisation, reporting, et
cetera. This file also helps with the Set up of
the project, by loading the necessary packages needed for the project
(e.g. data manipulation tools, importing, visualisation, mapping), sets
the working directory for the project, creates special functions and
parameters.
The main body of integrate.R
sets out the sequence of
data cleaning steps and analysis, by listing individual scripts. By
using the command source()
, one can call 100’s of lines of
code in a single line of code!
This wiki page provides an overview of a general workflow and
integrate.R
script for a coral reef monitoring data
project, with the idea that using a similar project structure will
facilitate collaboration and transferability of analyses and code.
Once cloning the data repository for this course, one will notice
that there is a high-level folder structure, with folders for data,
code, and outputs, and a single *.R
file
(i.e. integrate.R
). The individual folders have a nested
structure to separate different coding, analysis, and visualisation
routines:
A repository for coral reef monitoring data might have folders for different localities, methods, dates or data types. This helps in organising the data, code and outputs for a “living” data project.
This folder structure not only helps keep our data and code
organised, it also reflects the workflow for the
different steps in the project. For example, data importation &
cleaning process (i.e. creation_code
) creates a
*.rda
object save to the data_intermediate
folder. These data objects are then loaded into individual scripts in
analysis_code
to produce outputs (e.g. *.png) saved to the
figures
folder. Finally, a script in rmarkdown
will import *.png
figures and other binary outputs for
creating a report of monitoring results:
Of course, there are alternatives for setting up projects in
R
. The philosophy of this approach is that individual,
modular scripts of < 80-100 lines long are easier to proof read,
troubleshoot (if there is a problem), and separate different tasks of
data cleaning, analysis, visualisation and reporting.
The advantages of this particular approach will become more apparent as we progress in the training course.
In the project repository, we have set up an example
integrate.R
file to use for this training course.
The script sets out basic information on the name of the project, purpose or objective, the approach, main authors, date and other metadata required for the project. This is a good place to put warnings about confidentiality and responsible use of monitoring data.
##
## Project Name: Building the WIO Global Coral Reef Monitoring Network
## to make coral reef data secure & accessible
##
## Objective: Provide course structure and modules for data
## systematisation and visualisation training course
##
## Approach:
##
## Authors: Franz Smith, Mishal Gudka, David Obura, and others
## CORDIO East Africa
## Universidad San Francisco de Quito
##
##
## Date: 2021-04-30
##
## Notes: 1. This file is intended to provide a guide to the basic
## workflow of the project, attempting to 'integrate' the
## different steps necessary to conduct the analyses &
## create visual outputs
As mentioned, the Set up for working on the project
begins with cleaning the workspace rm(list = ls())
and
loading the necessary packages for the project. Sometimes it is helpful
to separate the list of packages into their broad functionality
(e.g. data manipulation, visualisation):
##
## 1. Set up the core functionality
##
# clean up
rm(list=ls())
# call to core packages for data manipulation
library(dplyr)
library(tidyr)
library(magrittr)
library(purrr)
library(lubridate)
library(hms)
library(stringr)
library(forcats)
# for importing different formats
library(readr)
library(readxl)
# call to visualisation & output generation
library(ggplot2)
library(GGally)
library(Cairo)
library(extrafont)
library(RColorBrewer)
library(viridis)
# functionality for spatial analyses
library(raster)
library(rgdal)
library(sf)
library(rgeos)
It is also convenient to use integrate.R
to set the
working directory for the project, fonts & themes, and other
settings necessary to have comparable results across collaborators and
institutions. Special functions (for example, quickview()
which is a convenience function for viewing the top rows of a tibble)
and other settings (e.g. projection details for mapping).
Without going into too much detail, this is just an example of how
you can use integrate.R
to manage routine settings for an
individual project:
# point to working directory ## -- will need to adjust for local copy -- ##
setwd("research/gcrmn_wio_data_course")
# set font for graphical outputs
theme_set(theme_bw(base_family = "Helvetica"))
CairoFonts( # slight mod to example in ?CairoFonts page
regular = "Helvetica:style = Regular",
bold = "Helvetica:style = Bold",
italic = "Helvetica:style = Oblique",
bolditalic = "Helvetica:style = BoldOblique"
)
# call to map theme
source("R/theme_nothing.R")
# create helper function for reviewing data
quickview <- function(x, n = 3L){ head(data.frame(x), n = n) }
# set utm details
utm_details <-
paste0("+proj=utm +zone=15 +south +datum=WGS84 +units=m",
" +no_defs +ellps=WGS84") %>% CRS()
After the Set up, the remainder of
integrate.R
goes through the individual steps for creating
data objects, cleaning and standardisation and analysis. As a general
strategy, we set out the location of the scripts separately as it saves
repeated typing. In addition, if the project structure changes (e.g. we
might want to add a folder level in formatting
to have
separate benthic
and fish
folders for those
examples), it means that only the top line of code needs to change and
the rest of the steps of that sequence should run.
##
## 2. Generate core data objects
##
# point to creation locale
creation_locale <- "creation_code/examples/formatting/"
# create percent cover data object for costa rica
source(paste0(creation_locale, "create_sessiles_dat.acosa.R"))
Now that we have covered how the project repository is set up & aspects of the workflow, we can now get into some code!