Previous steps

If you would like to return to information from the previous section, please click here.

Context

Overarching goals of the GCRMN implementation plan include the development of procedures for data cleaning, production of reproducible results and interoperability of systems. As part of the Data formatting & Standardisation Module, we have been working through an example of how data are imported, columns formatted, joined with additional information tables and modified to produce clean coral reef monitoring data.

Once the data formatting & standardisation process is established, it should work as the data set grows (e.g. after including additional rows in the data_raw spreadsheet). The idea is that we can re-run the data cleaning script using integrate.R and update the results after a monitoring campaign.

Likewise, the code for Visualisation of Status & Trends and other analyses of the data (which we will explore in a future module) should work on a dataset that includes 1 year of data or 10 years. The question is: How do we connect the data formatting & standardisation process with the visualisation and analyses? The answer is that we do that with “intermediate” data objects.

In this lesson, we will explore the basic strategy for managing “intermediate” data objects: their creation, storage and how to load() them back into R.

At the end of this training session, participants should have compentency for:

Saving clean data objects

Using the example of the percent cover data, we have arrived at a data object that has correct dates, numeric and character formats, links with additional metadata tables, in a tidy “long” form and is basically ready for analysis. The next step is to save the object in an “intermediate” location (i.e. separating these objects from the “raw” data).

For this, we have created a data_intermediate folder in the repository.

The saving of the formatted and standardised data object is pretty straight-forward: all we need to do is point to the save_locale and use the save() function and give the object a name (as an *.rda).

##
## 6. Generate outputs
##
  # point to save locale
    save_locale <- "data_intermediate/exercises/formatting/"

  # save sessiles
    save(percent_cover_acosa,
      file = paste0(save_locale, "percent_cover_acosa.rda"))

The general convention (for GCRMN) is to give the same name of the object to the *.rda file. For example for our object percent_cover_acosa, the *.rda name is simply: percent_cover_acosa.rda. This makes it easier to track the location of individual data objects in the repository as well as in the R workspace.

The R data object (i.e. *.rda) format is flexible and can be expanded to contain multiple data objects. For example, we might want to save the “percent cover data” and the “species characteristics” as separate data objects in a single *.rda file. This would mean that when we load the *.rda back into R, it comes with the two separate objects. Following our cconvention, we try to avoid this to make it clear to other project collaborators how to find and keep track of individual objects.

Again, these are just guidelines and individual projects may differ slightly in their requirements. In the Homework for this module, we will have some examples for how to best manage “intermediate” data objects.

How to retrieve an *.rda object?

Assuming that we have formattted & standardised our percent cover data (for example) and successfully saved it to data_intermediate, we can now work on other scripts for analysis, visualisation, and reporting. All we need to do is load() the intermediate object as part of the script. If starting a fresh R session with no objects in the workspace, the load() command calls the object back, like this:

##
## 1. Set up
##
  # point to data locale
    data_locale <- "data_intermediate/exercises/formatting/"

  # load percent cover data
    load(paste0(data_locale, "percent_cover_acosa.rda"))

Too easy! As mentioned, we will use this way of managing data objects for future modules like the Visualisation and Mapping modules, so the utility of this skill will become much more apparent later.

Next steps

Now that we have finished with the basic routine for Data Formatting & Standardisation it is now time for some additional exercises!