If you would like to return to information from the previous section, please click here.
Overarching goals of the GCRMN implementation plan include the development of procedures for data cleaning, production of reproducible results and interoperability of systems. As part of the Data formatting & Standardisation Module, we have been working through an example of how data are imported, columns formatted, joined with additional information tables and modified to produce clean coral reef monitoring data.
Once the data formatting & standardisation process is
established, it should work as the data set grows (e.g. after including
additional rows in the data_raw
spreadsheet). The idea is
that we can re-run the data cleaning script using
integrate.R
and update the results after a monitoring
campaign.
Likewise, the code for Visualisation of Status & Trends and other analyses of the data (which we will explore in a future module) should work on a dataset that includes 1 year of data or 10 years. The question is: How do we connect the data formatting & standardisation process with the visualisation and analyses? The answer is that we do that with “intermediate” data objects.
In this lesson, we will explore the basic strategy for managing
“intermediate” data objects: their creation, storage and how to
load()
them back into R.
At the end of this training session, participants should have compentency for:
*.rda
objectsUsing the example of the percent cover data, we have arrived at a data object that has correct dates, numeric and character formats, links with additional metadata tables, in a tidy “long” form and is basically ready for analysis. The next step is to save the object in an “intermediate” location (i.e. separating these objects from the “raw” data).
For this, we have created a data_intermediate
folder in
the repository.
The saving of the formatted and standardised data object is pretty
straight-forward: all we need to do is point to the
save_locale
and use the save()
function and
give the object a name (as an *.rda
).
##
## 6. Generate outputs
##
# point to save locale
save_locale <- "data_intermediate/exercises/formatting/"
# save sessiles
save(percent_cover_acosa,
file = paste0(save_locale, "percent_cover_acosa.rda"))
The general convention (for GCRMN) is to give the same name of the
object to the *.rda
file. For example for our object
percent_cover_acosa
, the *.rda
name is simply:
percent_cover_acosa.rda
. This makes it easier to track the
location of individual data objects in the repository as well as in the
R workspace.
The R data object (i.e. *.rda
) format is flexible and
can be expanded to contain multiple data objects. For example, we might
want to save the “percent cover data” and the “species characteristics”
as separate data objects in a single *.rda
file. This would
mean that when we load the *.rda
back into R, it comes with
the two separate objects. Following our cconvention, we try to avoid
this to make it clear to other project collaborators how to find and
keep track of individual objects.
Again, these are just guidelines and individual projects may differ slightly in their requirements. In the Homework for this module, we will have some examples for how to best manage “intermediate” data objects.
*.rda
object?Assuming that we have formattted & standardised our percent cover
data (for example) and successfully saved it to
data_intermediate
, we can now work on other scripts for
analysis, visualisation, and reporting. All we need to do is
load()
the intermediate object as part of the script. If
starting a fresh R session with no objects in the workspace, the
load()
command calls the object back, like this:
##
## 1. Set up
##
# point to data locale
data_locale <- "data_intermediate/exercises/formatting/"
# load percent cover data
load(paste0(data_locale, "percent_cover_acosa.rda"))
Too easy! As mentioned, we will use this way of managing data objects for future modules like the Visualisation and Mapping modules, so the utility of this skill will become much more apparent later.
Now that we have finished with the basic routine for Data Formatting & Standardisation it is now time for some additional exercises!