If you need to return to the previous section, please click here.
As mentioned as part of the approach for this training course is to focus on developing a core set of skills for data manipulation, visualisation, and tools for documenting and reporting on the status and trends of coral reef ecosystems.
The R
language has extensive capabilities and can be
used across many disciplines and applications. This course will be using
a small subset of packages and functions to provide examples of data
cleaning and visualisation.
However, course participants should be aware that there are often
several ways to achieve a similar result in R
and the
examples we are providing as part of this course are by no means the
definitive way of doing things, but rather a means to have a
similar coding style and promote interoperability across GCRMN regional
networks.
If you are new to R
, one of the challenges is thinking
and working with objects. In R
an object can be a data
table, a model, a series of names, a number, a coastline, or many other
things.
This is part of the genius of R
, in that it provides a
way to be scaleable. For example, a function that works on a
single object that is a list of names works if the list is 10 names long
or 10,000 names long!
A common way to create an object is using <-
or
“assign”:
# create a list of sites
site_list <-
c("coral garden",
"kanamai",
"kasa",
"likoni",
"nyali",
"ras iwatine",
"shark point",
"shelly")
As an example of how we can apply a function to our entire list (as a single object):
# review the sites
site_list
[1] "coral garden" "kanamai" "kasa" "likoni" "nyali" "ras iwatine"
[7] "shark point" "shelly"
# set capital letters
str_to_title(site_list)
[1] "Coral Garden" "Kanamai" "Kasa" "Likoni" "Nyali" "Ras Iwatine"
[7] "Shark Point" "Shelly"
During the course, we will get plenty of practise working with objects. In the meanwhile,some quick ways to figure out the type of object or its structure:
> class(site_list)
[1] "character"
> str(site_list)
chr [1:8] "coral garden" "kanamai" "kasa" "likoni" "nyali" "ras iwatine" "shark point" "shelly"
You can also get a list of the objects in your current workspace by
typing ls()
in the R
Console.
A lot of R
functionality comes from its
base
package. There are special functions for mathematics
(e.g. mean()
), control functions
(e.g. ifelse()
), data summary (e.g. table()
)
and generic plotting functions (e.g. plot()
). In this
training course, we will be using a number of functions that help with
formatting column types (e.g. as.numeric()
) and exercises
from individual modules will draw on other functions from
base
.
In the cleaning and standardising of data objects, it is useful to
know how to inspect the contents of a given column. This can be done by
putting a $
between the object name and the name of the
column:
# get list of sites
unique(reef_data$sites)
[1] "coral garden" "kanamai" "kasa" "likoni" "nyali" "ras iwatine"
[7] "shark point" "shelly"
Or we might want to check on the range of values of a column:
# get range of values
range(reef_data$percent_cover, na.rm = TRUE)
An equivalent in Excel would be to put an equation in a cell to
summarise a column (e.g. =MIN(L2:L238)
and
=MAX(L2:L238)
, and then take the difference).
In some instances, a quick summary of the object can be helpful:
# get summary of values
summary(reef_data)
Elements of an object can be accessed using brackets []
or double brackets [[]]
. For example, to access the second
element in our list of sites:
site_list[ 2 ]
[1] "kanamai"
If the data is in matrix
or data.frame
form, values can be accessed by identifying the row and column within
the brackets. For example:
reef_data[ row, col ]
To obtain all columns in the 3rd row of our
reef_data
:
reef_data[ 3, ]
And, to obtain all rows in the 3rd column:
reef_data[ , 3]
It is also possible to filter a data object using the bracket notations, for example:
reef_data[ reef_data$sites == "kanamai", ]
or:
reef_data[ reef_data$percent_cover >= 0.50 , ]
This is similar to making a data base query.
For this course, we will be using a set of packages that make the
filtering and handling of data objects a bit more intuitive and more
clearly documented. However, knowing how to access parts of a data
object or list is an essential skill for R
.
Sometimes in complex projects that require functinoality from a
number of different R
packages, there can be a duplication
of function names. Examples include: extract()
,
filter()
, select()
, among others.
The way to specify the package and function name is to use
namespace which is notaed by two successive colons
::
. In a line of code it would look something like:
reef_data %>%
dplyr::select(percent_cover)
This basically means that we want to use the select()
function from the dplyr
package.
Using namespace can also be helpful when you want to use a
function from a package without needing to load the entire package. For
example, for example to transform community data using a function from
the vegan
community analysis package:
reef_data %>%
spread(quadrate, percent_cover) %>%
dplyr::select(-sites) %>% vegan::decostand(method = "hellinger")
If you are wondering how to use a particular function, a good way to
get help is to simply type a ?
in front of the
function:
?str
This takes you to the R
Documentation page and provides
a Description of the function, the package where it
comes from and any options the function provides. Many of the
R
Documentation pages have Examples which
you can use to learn how the function works and/or how to modify it for
your particular task.
If you know the name of the package you want to know more about:
help(package = "tidyr")
The other learning resource that people should be aware of are the
package “vignettes”. In some cases, these are a recopilation of the
information found when searching ?
or help()
.
But, in other cases, they set out examples and mini-tutorials of how a
package works. These can be viewed on the package page at CRAN.
Here is a good example.
Something a bit more advanced, but shows the amazing potential of
R
can be found here.
Vignettes can also be accessed from the R
console by
using:
vignette(all = FALSE)
for attached packages or
browseVignettes()
.
As a number of the coding standards for the GCRMN come from Hadley
Wickam’s tidyverse
principles, we would like to provide a
brief introduction to them here.