If you would like to return to information from the previous session, please click here.
In order to begin with data formatting & standardisation, one
will need to import data into R. The source data will most likely be in
some spreadsheet form (e.g. *.xlsx
) and may include several
sheets of different data types.
In this lesson, we will go over key tools for importing data and begin the data cleaning & standardisation process. At the end of this lesson, you should be able to:
*.csv
filesThis first exercise comes from the Eastern Tropical Pacific region with data provided by Juan José Alvarado & Jorge Cortés of the Universidad de Costa Rica. The data illustrate some common issues encountered when importing spreadsheet data.
[The code for this example can be found here:
creation_code/exercises/formatting/create_percent_cover_acosa.R
]
The creation code should start with identifying the data locale (i.e. where the raw data can be found) and the data file, like this:
##
## 1. Set up
##
## -- import percent cover data -- ##
# point to data locale
data_locale <- "data_raw/examples/formatting/"
# point to data file
data_file <- "acosa.xlsx"
The logic for setting it up this way is that if the folder structure of the project changes of change in file name, these can easily be modified in the first lines of code and the rest of the script should continue to work unmodified. This is particularly helpful when dealing with lots and lots of code.
The next step creates an object by pasting the
data_locale
and the data_file
and piping the
result (i.e. %>%
) to the function from the
readxl
package: read_excel
:
# call to sessiles data
percent_cover_acosa <-
paste0(data_locale, data_file) %>%
read_excel(sheet = "Benthos")
The sheet with the percent cover data is in the sheet named
“Benthos”, which is specified in the read_excel()
function.
For this exercise, we also need o import the species names to link with the taxa codes. To do this, we simply change the name of the sheet to import:
## -- import species data -- ##
# call to taxa descriptions
taxa_descriptions <-
paste0(data_locale, data_file) %>%
read_excel(sheet = "spp")
One other common format for importing into R is *.csv
.
The method for importing is similar to above, with the exception that we
use the command read_csv()
## -- import csv data -- ##
# point to data locale
data_locale <- "data_raw/examples/formatting/"
# point to data file
data_file <- "mombasa.csv"
# import data
mombasa <-
paste0(data_locale, data_file) %>%
read_csv()
Having successfully imported some raw data, we now need to clean up some of the dates and other column formats. For this, we continue here for the next set of exercises.