If you would like to return to information from the previous session, please click here.
In order for WIO data course participants to have more “hands
on” experience with R
and git
, we have
provided a set of exercises as part of the Homework for
the Data Standards and Reproducible Research teaching
module.
These include key excercises in Version Control,
R
basics, object creation & basic
visualisation.
CORDIO East Africa staff will be available for additional support on Saturday 19th of June if participants require further assisatnce.
Version Control: Using git
In order to practise our version control skills using
git
, we will start by copying the homework script to our
personal folder in participants_code
, commit the changes
and push
them to Github:
## -- create local copy of homework script -- ##
# Instructions:
# * 1.1. Copy homework script to your `participants_code` folder:
# copy `exercise_code/homework_data_standards_reproducible_research.R` to
# the `exercise_code` folder in `participants_code/`
# * 2.2. In Gitbash or Git interface with RStudio:
# git add -A
# git status ## -- this verifies local changes in staging area -- ##
# git commit -m 'adding homework to exercise code'
# git pull ## -- this ensures your local copy is up-to-date -- ##
# git push ## -- this uploads your changes to github -- ##
Once this is done, participants can use their copy of the homework script for their exercises, including results, and adding notes for their reference.
We will then modify integrate.R
to point to
participants’ local copy of the repository. This will ensure that when
starting the project, users can automatically set the working directory
for accessing data and code for the project:
## -- modify `integrate.R` -- ##
# Instructions:
# * 2.1. Modify line #59 to align to local copy of repository
# from Gitbash, navigate to the project repository using the commands
# `pwd` ## -- this identifies the 'present working directory' -- ##
# `cd` ## -- this 'changes directory'; users will need to type the path -- ##
# * 2.2. In Gitbash or Git interface with RStudio:
# git add -A
# git status ## -- this verifies local changes in staging area -- ##
# git commit -m 'modifying working directory'
# git pull ## -- this ensures your local copy is up-to-date -- ##
# git push ## -- this uploads your changes to github -- ##
# Participants should copy the output from Gitbash to this script here and
# "comment" the text. This is done by selecting the text and selecting
# 'comment out' from the Edit menu (or using the command + ' keys)
# (Code should have a `#` symbol in front of the text)
By adding, committing and pushing your results to Github will help document your progress for the Homework exercises.
Similar to the above, we will need a local copy of the
create_reef_data.R
script for following the exercises.
Again, we will copy, add the changes, commit them and synchronise with
Github:
## -- create local copy of reef data creation code -- ##
# Instructions:
# * 3.1. Copy the `create_reef_data.R` script to your `participants_code` folder:
# copy `creation_code/examples/standards/create_reef_data.R` to the
# `creation_code` folder in your participants_code folder
# * 3.2. In Gitbash or Git interface with RStudio:
# git add -A
# git status ## -- this verifies local changes in staging area -- ##
# git commit -m 'adding local copy of reef data creation code'
# git pull ## -- this ensures your local copy is up-to-date -- ##
# git push ## -- this uploads your changes to github -- ##
Now that we have the necessary copies of the files for our homework,
we can begin by re-creating the reef_data
object using
individual objects. The base code includes the creation of
sites
, quadrates
, and
percent_cover
columns by including the code for creating
the site_list
sequence, random percent cover values, et
cetera.
For this exercise, we will create these as separate objects and bring
them together for creating the data.frame()
:
## -- re-create `reef_data` from individual list objects --##
# Instructions:
# * 4.1. Copy lines #33-63 from `create_reef_data.R` and paste it below line #65
# We will modify this code for this exercise
# * 4.2. Create 3 separate objects:
# `sites` that contains the list of sites repeated for the number of quadrates
# `quadrate` that contains the repeated quadrate numbers
# `percent_cover` which contains the relative percent covers per site
# * 4.3. Create data frame from individual objects
# Re-create the `reef_data` using the 3 dat objects
# (this look similar to the `reef_data` object)
Now that we are more familiar with the data object creation, we will
practise skills for accessing rows, columns and filtering the data
object using bracket (i.e. []
indexing):
## -- using indexing -- ##
# * 5.1. Use bracket `[]` indexing to select the quadrate data from "coral garden"
# Copy your code & output from the R Console below:
# * 5.2. Subset quadrates 1-3 from each site
# (Hint: to select using multiple entries one must use `%in%` instead of `==`)
# Copy your code & output from the R Console below:
# * 5.3. Create a similar subset using a list object, e.g.:
# qs_of_interest <- c(1, 3, 5)
# Copy your code & output from the R Console below:
We will build on our skills of indexing by incorporating pipes
%>%
into our data filtering and summarising:
## -- using pipes -- ##
# * 6.1. Using the `[]` indexing for "coral garden", get the mean percent cover
# Copy your code & output from the R Console below:
# * 6.2. Use the function `round()` to round the percent cover values to 3 digits
# (Hint: the `%>% operator can be used in the creation of the `percent_cover`
# object in exercise 4.2 above)
# Copy your code & output from the R Console below:
Please don’t forget to copy output from your
R console
to the homework script so we can see what your
results look like!
We can also use the pipe %>%
operator for creating
quick visuals:
## -- base visualisation -- ##
# * 7.1. Use the function `boxplot()` to examine the variation of `percent_cover`
# for each site
# (Hint: to obtain help on the use of `boxplot` type `?boxplot` in the R Console)
# Copy your code below to examine the graphical output:
Take some time to look at the documentation for the
boxplot()
function for additional examples.
Hint: The output from boxplot()
should look something like this:
reef_data
for coral
generaNow that we have mastered the base workings of objects, indexing, and
pipes %>%
, we will now create our own example coral reef
data. This time, with information on individual genera:
## -- create percent cover for multiple genera -- ##
# * 8.1. Copy lines #33-63 of `create_reef_data.R` below and modify it to include
# multiple genera. Use the relative percent cover values as a basis for
# Pocillopora = 50% of relative percent cover values for each site
# Pavona = 30% of percent cover values
# Acropora = 20% of percent cover values
# The general approach for this exercise is:
# i. create individual `relative_cover` values for each genera
# ii. add an additional column to the `data.frame()` called `genus`
# iii. adjust the `rep()` values to include the number of genera
# (Hint: check the `length()` of individual objects to make sure they match)
# Copy your code below or keep in your copy of `create_reef_data.R`:
# * 8.2 Visualise the percent cover by genera for each site
# Use `[]` indexing to select sites and `boxplot()` as in exercise 7.1
Follow the “general approach” to break down this problem into several steps. Once successfully creating the object, use your base visualisation skills to visualise distributions for individual genera.
Hint: The output from plotting genus percent cover by individual sites like this:
Participants should send their completed Homework by Sunday, 20 June to allow us time to review results before the revision session on Mondy. The idea is that we can identify things that worked well, not-so-well and how to compliment the training to ensure participants have the neccessary skills for the next mondule
Please use the commands below to add your results to the staging area, commit them and upload them to Github:
## -- submit homework for evaluation -- ##
# * 9.1. In Gitbash or Git interface with RStudio:
# git add -A
# git status ## -- this verifies local changes in staging area -- ##
# git commit -m 'submitting homework'
# git pull ## -- this ensures your local copy is up-to-date -- ##
# git push ## -- this uploads your changes to github -- ##
For a discussion on the Homework results look here.