If you would like to return to information from the previous session, please click here.
As one of the key objectives of the GCRMN is to be able to report the trends of coral reef status through time.
As we have successfully imported some percent cover data here, we will need to verify that the date information is correct and in an “unambiguous” format.
In this lesson, we will go through some exercises for cleaning up dates and formatting columns. At the end of this lesson, you should be able to:
Information from the first exercise comes from the Eastern Tropical Pacific region with data provided by Juan José Alvarado & Jorge Cortés of the Universidad de Costa Rica. The data illustrate some common issues encountered when dealing with dates and other column formats.
[The code for this example can be found here:
creation_code/exercises/formatting/create_percent_cover_acosa.R
]
The first thing we should notice when inspecting the data object of percent cover data is that the dates (i.e. “Fecha”) looks like a number, but is actually recognised as a character:
> percent_cover_acosa
# A tibble: 2,871 x 25
`# sitio` `Conservation Ar… Locality Site Diver Transect Date Keypuncher
<dbl> <chr> <chr> <chr> <chr> <dbl> <chr> <chr>
1 1 ACOSA Dominical El Ar… Carolina … 1 42780 Carolina Sher…
2 1 ACOSA Dominical El Ar… Carolina … 1 42780 Carolina Sher…
3 1 ACOSA Dominical El Ar… Carolina … 1 42780 Carolina Sher…
4 1 ACOSA Dominical El Ar… Carolina … 1 42780 Carolina Sher…
5 1 ACOSA Dominical El Ar… Carolina … 2 42780 Carolina Sher…
6 1 ACOSA Dominical El Ar… Carolina … 1 42780 Carolina Sher…
7 1 ACOSA Dominical El Ar… Carolina … 1 42780 Carolina Sher…
8 1 ACOSA Dominical El Ar… Carolina … 1 42780 Carolina Sher…
9 1 ACOSA Dominical El Ar… Carolina … 1 42780 Carolina Sher…
10 1 ACOSA Dominical El Ar… Carolina … 1 42780 Carolina Sher…
# … with 2,861 more rows, and 17 more variables: Keypunch date <chr>, Depth <dbl>,
# Depth category <chr>, Code <chr>, 1 <dbl>, 2 <dbl>, 3 <dbl>, 4 <dbl>, 5 <dbl>,
# 6 <dbl>, 7 <dbl>, 8 <dbl>, 9 <dbl>, 10 <dbl>, Total <dbl>, Average <dbl>,
# Percentage <dbl>
This is because Excel has its own way of treating dates and often
defaults to a ambiguous format (e.g. 06/07/11
, where it is
unclear what is the year, month, date). These need to be corrected in R
to provide a standard unambiguous date format
(i.e. yyyy-mm-dd
).
This is done by using mutate()
to first transform the
column “Fecha” to as.numeric()
and then to a date using
as.Date()
. In converting to a date, R needs to know the
origin for calculating the date. For Windows version of Excel, this date
is 1899-12-30
.
# set fecha to date
percent_cover_acosa %<>%
mutate(Date = Date %>% as.numeric() %>% as.Date(origin = "1899-12-30"))
We can now see that “Date” is now in the correct date format:
percent_cover_acosa
# A tibble: 2,871 x 25
`# sitio` `Conservation Ar… Locality Site Diver Transect Date Keypuncher
<dbl> <chr> <chr> <chr> <chr> <dbl> <date> <chr>
1 1 ACOSA Dominical El A… Carolin… 1 2017-02-14 Carolina Sh…
2 1 ACOSA Dominical El A… Carolin… 1 2017-02-14 Carolina Sh…
3 1 ACOSA Dominical El A… Carolin… 1 2017-02-14 Carolina Sh…
4 1 ACOSA Dominical El A… Carolin… 1 2017-02-14 Carolina Sh…
5 1 ACOSA Dominical El A… Carolin… 2 2017-02-14 Carolina Sh…
6 1 ACOSA Dominical El A… Carolin… 1 2017-02-14 Carolina Sh…
7 1 ACOSA Dominical El A… Carolin… 1 2017-02-14 Carolina Sh…
8 1 ACOSA Dominical El A… Carolin… 1 2017-02-14 Carolina Sh…
9 1 ACOSA Dominical El A… Carolin… 1 2017-02-14 Carolina Sh…
10 1 ACOSA Dominical El A… Carolin… 1 2017-02-14 Carolina Sh…
# … with 2,861 more rows, and 17 more variables: Keypunch date <chr>, Depth <dbl>,
# Depth category <chr>, Code <chr>, 1 <dbl>, 2 <dbl>, 3 <dbl>, 4 <dbl>, 5 <dbl>,
# 6 <dbl>, 7 <dbl>, 8 <dbl>, 9 <dbl>, 10 <dbl>, Total <dbl>, Average <dbl>,
# Percentage <dbl>
You may have also noticed that there was a warning message produced
when using as.Date()
:
# set fecha to date
percent_cover_acosa %<>%
mutate(Date = Date %>% as.numeric() %>% as.Date(origin = "1899-12-30"))
Warning message:
Problem with `mutate()` column `Date`.
ℹ `Date = Date %>% as.numeric() %>% as.Date(origin = "1899-12-30")`.
ℹ NAs introduced by coercion
We will come back to this as part of the Homework for this module.
Having successfully converted “Fecha” to date, we now need to join the taxonomic information to the percent cover information. For this, we continue on the next page.