If you would like to return to the previous section, please click here.
Before starting on Data Formatting &
Standardisation, we will need to get more familiar with working
with R
, git
and the GCRMN standards. As you
should have already created a
local copy of the gcrmn_wio_data_course repository,
we need a bit of practise working with R
and synchronising
with git
.
This wiki page sets out the core excercises for working with
R
and git
, with the idea that all course
participants will have these skills prior to importing data, cleaning
& formatting for analysis and visualisation.
Participants should find a folder with a short 6-letter abbreviation
of their name in the participants_code
folder. In this
folder, users will find a copy of the integrate.R
script.
To illustrate how to use git
and syncronise with the
Github repository, users should modify this script (e.g. changing the
Date
on line 15):
Participants should then open GitBash.exe or
Terminal.app and navigate to the project locale by using the
command cd
. Once in the
gcrmn_wio_data_course folder, particpants can then add
their changes by:
git add -A
To check on the status:
git status
Users should see that the integrate.R
file is now added
to the staging area.
We will now commit the change to our local copy of the repository and push it to Github. Note that we will add a message describing what change we made for this individual commit.
# commit the change
git commit -m 'changing the date in integrate'
# then push to github
git push
During the session, we will work with variations of this routine so
participants are familiar with the push
, pull
,
add
sequence.
In order to get used to working with R
, we will create a
simple data frame of coral percent cover data from a number of sites.
Normally, we would be importing monitoring data from a spreadsheet, but
the skill for creating a data object can be useful for making
concordance tables and other applications.
For the purposes of this exercise, we will use it to illustrate the
basic working of R
objects and the standards for
documenting code and making it understandable for other users.
We will start by creating a list of sites:
# create a list of sites
site_list <-
c("coral garden",
"kanamai",
"kasa",
"likoni",
"nyali",
"ras iwatine",
"shark point",
"shelly")
Next, we will set the number of replicate quadrates and an average percent cover for each of our monitoring sites:
## -- create reef data -- ##
# set number of replicates
n_quads <- 5
# set relative cover per site
cvr_cor <- 0.60
cvr_kan <- 0.65
cvr_kas <- 0.45
cvr_lik <- 0.78
cvr_nya <- 0.73
cvr_ras <- 0.68
cvr_sha <- 0.76
cvr_she <- 0.58
The strategy for creating the data is to create random data that reflect the average percent cover of each site, concatenate (i.e. “bind”) the site data together, and add columns to identify sites and replicates:
# set seed for reproducibility
set.seed(3)
# generate data
reef_data <-
data.frame(
sites = rep(site_list, each = n_quads),
quadrate = rep(seq(1:5), times = length(site_list)),
percent_cover = c(rnorm(5, cvr_cor, (cvr_cor / 3.0)),
rnorm(5, cvr_kan, (cvr_kan / 5.2)),
rnorm(5, cvr_kas, (cvr_kas / 3.2)),
rnorm(5, cvr_lik, (cvr_lik / 6.8)),
rnorm(5, cvr_nya, (cvr_nya / 4.2)),
rnorm(5, cvr_ras, (cvr_ras / 4.4)),
rnorm(5, cvr_sha, (cvr_sha / 4.1)),
rnorm(5, cvr_she, (cvr_she / 5.4)))
)
We should note the set.seed()
function, which ensures
that the randomisation function rnorm()
creates the same
random numbers. The reef_data
object should look something
like this:
During the course, we will take some time to work with objects, indexing, and how to set up our code to be “modular”, “extendable”, and well documented.
For example, we can test some of the indexing skills mentioned here:
reef_data[ 3, ]
sites quadrate percent_cover
3 coral garden 3 0.6517576
In addition, we can experiment with using pipes(i.e. %>%
).
For example, if we want to change the number of decimal places for our
percent_cover
data, all we need to do is:
# round the percent cover data
reef_data$percent_cover %>% round(3)
[1] 0.408 0.541 0.652 0.370 0.639 0.654 0.661 0.790 0.498 0.808 0.345 0.291 0.349 0.486 0.471 0.745 0.671 0.706 0.920
[20] 0.803 0.629 0.566 0.695 0.440 0.646 0.565 0.859 0.836 0.669 0.504 0.927 0.918 0.895 0.897 0.695 0.656 0.720 0.584
[39] 0.475 0.665
# to save it to the object
reef_data$percent_cover <- reef_data$percent_cover %>% round(3)
Of course, there is a lot more we can do with this simple object and we will explore some of these during the training session.
Now that we have acquired some skills working with R
and
git
, we will develop some Homework tasks to get further practise
and get ready for the Data Formatting &
Standardisation module!