Previous steps

If you need to return to the previous section, please click here.

Context

As mentioned as part of the approach for this training course is to focus on developing a core set of skills for data manipulation, visualisation, and tools for documenting and reporting on the status and trends of coral reef ecosystems.

The R language has extensive capabilities and can be used across many disciplines and applications. This course will be using a small subset of packages and functions to provide examples of data cleaning and visualisation.

However, course participants should be aware that there are often several ways to achieve a similar result in R and the examples we are providing as part of this course are by no means the definitive way of doing things, but rather a means to have a similar coding style and promote interoperability across GCRMN regional networks.

Object based programming

If you are new to R, one of the challenges is thinking and working with objects. In R an object can be a data table, a model, a series of names, a number, a coastline, or many other things.

This is part of the genius of R, in that it provides a way to be scaleable. For example, a function that works on a single object that is a list of names works if the list is 10 names long or 10,000 names long!

A common way to create an object is using <- or “assign”:

 # create a list of sites
   site_list <-
     c("coral garden",
       "kanamai",
       "kasa",
       "likoni",
       "nyali",
       "ras iwatine",
       "shark point",
       "shelly")

As an example of how we can apply a function to our entire list (as a single object):

  # review the sites
    site_list
[1] "coral garden" "kanamai"      "kasa"         "likoni"       "nyali"        "ras iwatine"
[7] "shark point"  "shelly"

   # set capital letters
     str_to_title(site_list)
[1] "Coral Garden" "Kanamai"      "Kasa"         "Likoni"       "Nyali"        "Ras Iwatine"
[7] "Shark Point"  "Shelly"

During the course, we will get plenty of practise working with objects. In the meanwhile,some quick ways to figure out the type of object or its structure:

> class(site_list)
[1] "character"
> str(site_list)
 chr [1:8] "coral garden" "kanamai" "kasa" "likoni" "nyali" "ras iwatine" "shark point" "shelly"

You can also get a list of the objects in your current workspace by typing ls() in the R Console.

Base R

A lot of R functionality comes from its base package. There are special functions for mathematics (e.g. mean()), control functions (e.g. ifelse()), data summary (e.g. table()) and generic plotting functions (e.g. plot()). In this training course, we will be using a number of functions that help with formatting column types (e.g. as.numeric()) and exercises from individual modules will draw on other functions from base.

Working with objects

In the cleaning and standardising of data objects, it is useful to know how to inspect the contents of a given column. This can be done by putting a $ between the object name and the name of the column:

  # get list of sites
    unique(reef_data$sites)
[1] "coral garden" "kanamai"      "kasa"         "likoni"       "nyali"        "ras iwatine"
[7] "shark point"  "shelly"

Or we might want to check on the range of values of a column:

  # get range of values
    range(reef_data$percent_cover, na.rm = TRUE)

An equivalent in Excel would be to put an equation in a cell to summarise a column (e.g. =MIN(L2:L238) and =MAX(L2:L238), and then take the difference).

In some instances, a quick summary of the object can be helpful:

  # get summary of values
    summary(reef_data)

Indexing

Elements of an object can be accessed using brackets [] or double brackets [[]]. For example, to access the second element in our list of sites:

site_list[ 2 ]
[1] "kanamai"

If the data is in matrix or data.frame form, values can be accessed by identifying the row and column within the brackets. For example:

 reef_data[ row, col ]

To obtain all columns in the 3rd row of our reef_data:

reef_data[ 3, ]

And, to obtain all rows in the 3rd column:

reef_data[ , 3]

It is also possible to filter a data object using the bracket notations, for example:

reef_data[ reef_data$sites == "kanamai", ]

or:

reef_data[ reef_data$percent_cover >= 0.50 , ]

This is similar to making a data base query.

For this course, we will be using a set of packages that make the filtering and handling of data objects a bit more intuitive and more clearly documented. However, knowing how to access parts of a data object or list is an essential skill for R.

Understanding Namespace

Sometimes in complex projects that require functinoality from a number of different R packages, there can be a duplication of function names. Examples include: extract(), filter(), select(), among others.

The way to specify the package and function name is to use namespace which is notaed by two successive colons ::. In a line of code it would look something like:

reef_data %>%
  dplyr::select(percent_cover)

This basically means that we want to use the select() function from the dplyr package.

Using namespace can also be helpful when you want to use a function from a package without needing to load the entire package. For example, for example to transform community data using a function from the vegan community analysis package:

reef_data %>%
  spread(quadrate, percent_cover) %>%
  dplyr::select(-sites) %>% vegan::decostand(method = "hellinger")

Getting Help

If you are wondering how to use a particular function, a good way to get help is to simply type a ? in front of the function:

?str

This takes you to the R Documentation page and provides a Description of the function, the package where it comes from and any options the function provides. Many of the R Documentation pages have Examples which you can use to learn how the function works and/or how to modify it for your particular task.

If you know the name of the package you want to know more about:

help(package = "tidyr")

The other learning resource that people should be aware of are the package “vignettes”. In some cases, these are a recopilation of the information found when searching ? or help(). But, in other cases, they set out examples and mini-tutorials of how a package works. These can be viewed on the package page at CRAN.

Here is a good example.

Something a bit more advanced, but shows the amazing potential of R can be found here.

Vignettes can also be accessed from the R console by using:

vignette(all = FALSE) for attached packages or browseVignettes().

Next steps

As a number of the coding standards for the GCRMN come from Hadley Wickam’s tidyverse principles, we would like to provide a brief introduction to them here.