If you need to return to the previous section, please click here.
As mentioned as part of the approach for this training course is to focus on developing a core set of skills for data manipulation, visualisation, and tools for documenting and reporting on the status and trends of coral reef ecosystems.
The R language has extensive capabilities and can be
used across many disciplines and applications. This course will be using
a small subset of packages and functions to provide examples of data
cleaning and visualisation.
However, course participants should be aware that there are often
several ways to achieve a similar result in R and the
examples we are providing as part of this course are by no means the
definitive way of doing things, but rather a means to have a
similar coding style and promote interoperability across GCRMN regional
networks.
If you are new to R, one of the challenges is thinking
and working with objects. In R an object can be a data
table, a model, a series of names, a number, a coastline, or many other
things.
This is part of the genius of R, in that it provides a
way to be scaleable. For example, a function that works on a
single object that is a list of names works if the list is 10 names long
or 10,000 names long!
A common way to create an object is using <- or
“assign”:
 # create a list of sites
   site_list <-
     c("coral garden",
       "kanamai",
       "kasa",
       "likoni",
       "nyali",
       "ras iwatine",
       "shark point",
       "shelly")
As an example of how we can apply a function to our entire list (as a single object):
  # review the sites
    site_list
[1] "coral garden" "kanamai"      "kasa"         "likoni"       "nyali"        "ras iwatine"
[7] "shark point"  "shelly"
   # set capital letters
     str_to_title(site_list)
[1] "Coral Garden" "Kanamai"      "Kasa"         "Likoni"       "Nyali"        "Ras Iwatine"
[7] "Shark Point"  "Shelly"
During the course, we will get plenty of practise working with objects. In the meanwhile,some quick ways to figure out the type of object or its structure:
> class(site_list)
[1] "character"
> str(site_list)
 chr [1:8] "coral garden" "kanamai" "kasa" "likoni" "nyali" "ras iwatine" "shark point" "shelly"
You can also get a list of the objects in your current workspace by
typing ls() in the R Console.
A lot of R functionality comes from its
base package. There are special functions for mathematics
(e.g. mean()), control functions
(e.g. ifelse()), data summary (e.g. table())
and generic plotting functions (e.g. plot()). In this
training course, we will be using a number of functions that help with
formatting column types (e.g. as.numeric()) and exercises
from individual modules will draw on other functions from
base.
In the cleaning and standardising of data objects, it is useful to
know how to inspect the contents of a given column. This can be done by
putting a $ between the object name and the name of the
column:
  # get list of sites
    unique(reef_data$sites)
[1] "coral garden" "kanamai"      "kasa"         "likoni"       "nyali"        "ras iwatine"
[7] "shark point"  "shelly"
Or we might want to check on the range of values of a column:
  # get range of values
    range(reef_data$percent_cover, na.rm = TRUE)
An equivalent in Excel would be to put an equation in a cell to
summarise a column (e.g. =MIN(L2:L238) and
=MAX(L2:L238), and then take the difference).
In some instances, a quick summary of the object can be helpful:
  # get summary of values
    summary(reef_data)
Elements of an object can be accessed using brackets []
or double brackets [[]]. For example, to access the second
element in our list of sites:
site_list[ 2 ]
[1] "kanamai"
If the data is in matrix or data.frame
form, values can be accessed by identifying the row and column within
the brackets. For example:
 reef_data[ row, col ]
To obtain all columns in the 3rd row of our
reef_data:
reef_data[ 3, ]
And, to obtain all rows in the 3rd column:
reef_data[ , 3]
It is also possible to filter a data object using the bracket notations, for example:
reef_data[ reef_data$sites == "kanamai", ]
or:
reef_data[ reef_data$percent_cover >= 0.50 , ]
This is similar to making a data base query.
For this course, we will be using a set of packages that make the
filtering and handling of data objects a bit more intuitive and more
clearly documented. However, knowing how to access parts of a data
object or list is an essential skill for R.
Sometimes in complex projects that require functinoality from a
number of different R packages, there can be a duplication
of function names. Examples include: extract(),
filter(), select(), among others.
The way to specify the package and function name is to use
namespace which is notaed by two successive colons
::. In a line of code it would look something like:
reef_data %>%
  dplyr::select(percent_cover)
This basically means that we want to use the select()
function from the dplyr package.
Using namespace can also be helpful when you want to use a
function from a package without needing to load the entire package. For
example, for example to transform community data using a function from
the vegan community analysis package:
reef_data %>%
  spread(quadrate, percent_cover) %>%
  dplyr::select(-sites) %>% vegan::decostand(method = "hellinger")
If you are wondering how to use a particular function, a good way to
get help is to simply type a ? in front of the
function:
?str
This takes you to the R Documentation page and provides
a Description of the function, the package where it
comes from and any options the function provides. Many of the
R Documentation pages have Examples which
you can use to learn how the function works and/or how to modify it for
your particular task.
If you know the name of the package you want to know more about:
help(package = "tidyr")
The other learning resource that people should be aware of are the
package “vignettes”. In some cases, these are a recopilation of the
information found when searching ? or help().
But, in other cases, they set out examples and mini-tutorials of how a
package works. These can be viewed on the package page at CRAN.
Here is a good example.
Something a bit more advanced, but shows the amazing potential of
R can be found here.
Vignettes can also be accessed from the R console by
using:
vignette(all = FALSE) for attached packages or
browseVignettes().
As a number of the coding standards for the GCRMN come from Hadley
Wickam’s tidyverse principles, we would like to provide a
brief introduction to them here.