Previous steps

If you would like to return to information from the previous session, please click here.

Context

In order to begin with data formatting & standardisation, one will need to import data into R. The source data will most likely be in some spreadsheet form (e.g. *.xlsx) and may include several sheets of different data types.

In this lesson, we will go over key tools for importing data and begin the data cleaning & standardisation process. At the end of this lesson, you should be able to:

This first exercise comes from the Eastern Tropical Pacific region with data provided by Juan José Alvarado & Jorge Cortés of the Universidad de Costa Rica. The data illustrate some common issues encountered when importing spreadsheet data.

[The code for this example can be found here: creation_code/exercises/formatting/create_percent_cover_acosa.R]

Step 1: Importing raw data

The creation code should start with identifying the data locale (i.e. where the raw data can be found) and the data file, like this:

##
## 1. Set up
##
 ## -- import percent cover data -- ##
  # point to data locale
    data_locale <- "data_raw/examples/formatting/"

  # point to data file
    data_file <- "acosa.xlsx"

The logic for setting it up this way is that if the folder structure of the project changes of change in file name, these can easily be modified in the first lines of code and the rest of the script should continue to work unmodified. This is particularly helpful when dealing with lots and lots of code.

The next step creates an object by pasting the data_locale and the data_file and piping the result (i.e. %>%) to the function from the readxl package: read_excel:

  # call to sessiles data
    percent_cover_acosa <-
      paste0(data_locale, data_file) %>%
      read_excel(sheet = "Benthos")

The sheet with the percent cover data is in the sheet named “Benthos”, which is specified in the read_excel() function.

For this exercise, we also need o import the species names to link with the taxa codes. To do this, we simply change the name of the sheet to import:

 ## -- import species data -- ##
  # call to taxa descriptions
    taxa_descriptions <-
      paste0(data_locale, data_file) %>%
      read_excel(sheet = "spp")

Other data formats

One other common format for importing into R is *.csv. The method for importing is similar to above, with the exception that we use the command read_csv()

 ## -- import csv data -- ##
   # point to data locale
    data_locale <- "data_raw/examples/formatting/"

  # point to data file
    data_file <- "mombasa.csv"

  # import data
    mombasa <-
      paste0(data_locale, data_file) %>%
      read_csv()

Next steps

Having successfully imported some raw data, we now need to clean up some of the dates and other column formats. For this, we continue here for the next set of exercises.