Previous steps

If you would like to return to information from the previous session, please click here.

Context

In order for WIO data course participants to have more “hands on” experience with R and git, we have provided a set of exercises as part of the Homework for the Data Standards and Reproducible Research teaching module.

These include key excercises in Version Control, R basics, object creation & basic visualisation.

CORDIO East Africa staff will be available for additional support on Saturday 19th of June if participants require further assisatnce.

Version Control: Using git

In order to practise our version control skills using git, we will start by copying the homework script to our personal folder in participants_code, commit the changes and push them to Github:

 ## -- create local copy of homework script -- ##
  # Instructions:
  #  * 1.1. Copy homework script to your `participants_code` folder:
  #         copy `exercise_code/homework_data_standards_reproducible_research.R` to
  #           the `exercise_code` folder in `participants_code/`

  #  * 2.2. In Gitbash or Git interface with RStudio:
  #           git add -A
  #           git status  ## -- this verifies local changes in staging area -- ##
  #           git commit -m 'adding homework to exercise code'
  #           git pull    ## -- this ensures your local copy is up-to-date -- ##
  #           git push    ## -- this uploads your changes to github -- ##

Once this is done, participants can use their copy of the homework script for their exercises, including results, and adding notes for their reference.

We will then modify integrate.R to point to participants’ local copy of the repository. This will ensure that when starting the project, users can automatically set the working directory for accessing data and code for the project:

 ## -- modify `integrate.R` -- ##
  # Instructions:
  #  * 2.1. Modify line #59 to align to local copy of repository
  #           from Gitbash, navigate to the project repository using the commands
  #             `pwd`  ## -- this identifies the 'present working directory'            -- ##
  #             `cd`   ## -- this 'changes directory'; users will need to type the path -- ##

  #  * 2.2. In Gitbash or Git interface with RStudio:
  #           git add -A
  #           git status  ## -- this verifies local changes in staging area -- ##
  #           git commit -m 'modifying working directory'
  #           git pull    ## -- this ensures your local copy is up-to-date -- ##
  #           git push    ## -- this uploads your changes to github -- ##

  #         Participants should copy the output from Gitbash to this script here and
  #           "comment" the text. This is done by selecting the text and selecting
  #           'comment out' from the Edit menu (or using the command + ' keys)
  #         (Code should have a `#` symbol in front of the text)

By adding, committing and pushing your results to Github will help document your progress for the Homework exercises.

Similar to the above, we will need a local copy of the create_reef_data.R script for following the exercises. Again, we will copy, add the changes, commit them and synchronise with Github:

 ## -- create local copy of reef data creation code -- ##
  # Instructions:
  #  * 3.1. Copy the `create_reef_data.R` script to your `participants_code` folder:
  #           copy `creation_code/examples/standards/create_reef_data.R` to the
  #           `creation_code` folder in your participants_code folder

  #  * 3.2. In Gitbash or Git interface with RStudio:
  #           git add -A
  #           git status  ## -- this verifies local changes in staging area -- ##
  #           git commit -m 'adding local copy of reef data creation code'
  #           git pull    ## -- this ensures your local copy is up-to-date -- ##
  #           git push    ## -- this uploads your changes to github -- ##

R Exercises

Now that we have the necessary copies of the files for our homework, we can begin by re-creating the reef_data object using individual objects. The base code includes the creation of sites, quadrates, and percent_cover columns by including the code for creating the site_list sequence, random percent cover values, et cetera.

For this exercise, we will create these as separate objects and bring them together for creating the data.frame():

 ## -- re-create `reef_data` from individual list objects --##
  # Instructions:
  #  * 4.1. Copy lines #33-63 from `create_reef_data.R` and paste it below line #65
  #           We will modify this code for this exercise

  #  * 4.2. Create 3 separate objects:
  #           `sites`    that contains the list of sites repeated for the number of quadrates
  #           `quadrate` that contains the repeated quadrate numbers
  #           `percent_cover` which contains the relative percent covers per site

  #  * 4.3. Create data frame from individual objects
  #           Re-create the `reef_data` using the 3 dat objects
  #           (this look similar to the `reef_data` object)

Now that we are more familiar with the data object creation, we will practise skills for accessing rows, columns and filtering the data object using bracket (i.e. [] indexing):

 ## -- using indexing -- ##
  #  * 5.1. Use bracket `[]` indexing to select the quadrate data from "coral garden"
  #           Copy your code & output from the R Console below:

  #  * 5.2. Subset quadrates 1-3 from each site
  #           (Hint: to select using multiple entries one must use `%in%` instead of `==`)
  #           Copy your code & output from the R Console below:

  #  * 5.3. Create a similar subset using a list object, e.g.:
  #           qs_of_interest <- c(1, 3, 5)
  #           Copy your code & output from the R Console below:

We will build on our skills of indexing by incorporating pipes %>% into our data filtering and summarising:

 ## -- using pipes -- ##
  #  * 6.1. Using the `[]` indexing for "coral garden", get the mean percent cover
  #           Copy your code & output from the R Console below:

  #  * 6.2. Use the function `round()` to round the percent cover values to 3 digits
  #           (Hint: the `%>% operator can be used in the creation of the `percent_cover`
  #           object in exercise 4.2 above)
  #           Copy your code & output from the R Console below:

Please don’t forget to copy output from your R console to the homework script so we can see what your results look like!

We can also use the pipe %>% operator for creating quick visuals:

 ## -- base visualisation -- ##
  #  * 7.1. Use the function `boxplot()` to examine the variation of `percent_cover`
  #           for each site
  #           (Hint: to obtain help on the use of `boxplot` type `?boxplot` in the R Console)
  #           Copy your code below to examine the graphical output:

Take some time to look at the documentation for the boxplot() function for additional examples.

Hint: The output from boxplot() should look something like this:

Example reef_data for coral genera

Now that we have mastered the base workings of objects, indexing, and pipes %>%, we will now create our own example coral reef data. This time, with information on individual genera:

 ## -- create percent cover for multiple genera -- ##
  #  * 8.1. Copy lines #33-63 of `create_reef_data.R` below and modify it to include
  #           multiple genera. Use the relative percent cover values as a basis for
  #             Pocillopora =  50% of relative percent cover values for each site
  #             Pavona      =  30% of percent cover values
  #             Acropora    =  20% of percent cover values
  #         The general approach for this exercise is:
  #           i.   create individual `relative_cover` values for each genera
  #           ii.  add an additional column to the `data.frame()` called `genus`
  #           iii. adjust the `rep()` values to include the number of genera
  #           (Hint: check the `length()` of individual objects to make sure they match)
  #           Copy your code below or keep in your copy of `create_reef_data.R`:

  #  * 8.2 Visualise the percent cover by genera for each site
  #          Use `[]` indexing to select sites and `boxplot()` as in exercise 7.1

Follow the “general approach” to break down this problem into several steps. Once successfully creating the object, use your base visualisation skills to visualise distributions for individual genera.

Hint: The output from plotting genus percent cover by individual sites like this:

Next Steps

Participants should send their completed Homework by Sunday, 20 June to allow us time to review results before the revision session on Mondy. The idea is that we can identify things that worked well, not-so-well and how to compliment the training to ensure participants have the neccessary skills for the next mondule

Please use the commands below to add your results to the staging area, commit them and upload them to Github:

 ## -- submit homework for evaluation -- ##
  #  * 9.1. In Gitbash or Git interface with RStudio:
  #           git add -A
  #           git status  ## -- this verifies local changes in staging area -- ##
  #           git commit -m 'submitting homework'
  #           git pull    ## -- this ensures your local copy is up-to-date -- ##
  #           git push    ## -- this uploads your changes to github -- ##

For a discussion on the Homework results look here.