Previous steps

If you would like to return to the previous section, please click here.

Context

Before starting on Data Formatting & Standardisation, we will need to get more familiar with working with R, git and the GCRMN standards. As you should have already created a local copy of the gcrmn_wio_data_course repository, we need a bit of practise working with R and synchronising with git.

This wiki page sets out the core excercises for working with R and git, with the idea that all course participants will have these skills prior to importing data, cleaning & formatting for analysis and visualisation.

Exercise #1: Edit a script, add the changes and push to Github

Participants should find a folder with a short 6-letter abbreviation of their name in the participants_code folder. In this folder, users will find a copy of the integrate.R script.

To illustrate how to use git and syncronise with the Github repository, users should modify this script (e.g. changing the Date on line 15):

Participants should then open GitBash.exe or Terminal.app and navigate to the project locale by using the command cd. Once in the gcrmn_wio_data_course folder, particpants can then add their changes by:

git add -A

To check on the status:

git status

Users should see that the integrate.R file is now added to the staging area.

We will now commit the change to our local copy of the repository and push it to Github. Note that we will add a message describing what change we made for this individual commit.

# commit the change
git commit -m 'changing the date in integrate'

# then push to github
git push

During the session, we will work with variations of this routine so participants are familiar with the push, pull, add sequence.

Exercise #2: Creating a data object in R

In order to get used to working with R, we will create a simple data frame of coral percent cover data from a number of sites. Normally, we would be importing monitoring data from a spreadsheet, but the skill for creating a data object can be useful for making concordance tables and other applications.

For the purposes of this exercise, we will use it to illustrate the basic working of R objects and the standards for documenting code and making it understandable for other users.

We will start by creating a list of sites:

 # create a list of sites
   site_list <-
     c("coral garden",
       "kanamai",
       "kasa",
       "likoni",
       "nyali",
       "ras iwatine",
       "shark point",
       "shelly")

Next, we will set the number of replicate quadrates and an average percent cover for each of our monitoring sites:

 ## -- create reef data -- ##
  # set number of replicates
    n_quads <- 5

  # set relative cover per site
    cvr_cor <- 0.60
    cvr_kan <- 0.65
    cvr_kas <- 0.45
    cvr_lik <- 0.78
    cvr_nya <- 0.73
    cvr_ras <- 0.68
    cvr_sha <- 0.76
    cvr_she <- 0.58

The strategy for creating the data is to create random data that reflect the average percent cover of each site, concatenate (i.e. “bind”) the site data together, and add columns to identify sites and replicates:

  # set seed for reproducibility
    set.seed(3)

  # generate data
    reef_data <-
      data.frame(
        sites         = rep(site_list, each = n_quads),
        quadrate      = rep(seq(1:5), times = length(site_list)),
        percent_cover = c(rnorm(5, cvr_cor, (cvr_cor / 3.0)),
                          rnorm(5, cvr_kan, (cvr_kan / 5.2)),
                          rnorm(5, cvr_kas, (cvr_kas / 3.2)),
                          rnorm(5, cvr_lik, (cvr_lik / 6.8)),
                          rnorm(5, cvr_nya, (cvr_nya / 4.2)),
                          rnorm(5, cvr_ras, (cvr_ras / 4.4)),
                          rnorm(5, cvr_sha, (cvr_sha / 4.1)),
                          rnorm(5, cvr_she, (cvr_she / 5.4)))
        )

We should note the set.seed() function, which ensures that the randomisation function rnorm() creates the same random numbers. The reef_data object should look something like this:

During the course, we will take some time to work with objects, indexing, and how to set up our code to be “modular”, “extendable”, and well documented.

For example, we can test some of the indexing skills mentioned here:

reef_data[ 3, ]
         sites quadrate percent_cover
3 coral garden        3     0.6517576

In addition, we can experiment with using pipes(i.e. %>%). For example, if we want to change the number of decimal places for our percent_cover data, all we need to do is:

  # round the percent cover data
  reef_data$percent_cover %>% round(3)
 [1] 0.408 0.541 0.652 0.370 0.639 0.654 0.661 0.790 0.498 0.808 0.345 0.291 0.349 0.486 0.471 0.745 0.671 0.706 0.920
[20] 0.803 0.629 0.566 0.695 0.440 0.646 0.565 0.859 0.836 0.669 0.504 0.927 0.918 0.895 0.897 0.695 0.656 0.720 0.584
[39] 0.475 0.665

  # to save it to the object
    reef_data$percent_cover <- reef_data$percent_cover %>% round(3)

Of course, there is a lot more we can do with this simple object and we will explore some of these during the training session.

Next Steps

Now that we have acquired some skills working with R and git, we will develop some Homework tasks to get further practise and get ready for the Data Formatting & Standardisation module!