Previous steps

If you would like to return to information from the previous session, please click here.

Context

As one of the key objectives of the GCRMN is to be able to report the trends of coral reef status through time.

As we have successfully imported some percent cover data here, we will need to verify that the date information is correct and in an “unambiguous” format.

In this lesson, we will go through some exercises for cleaning up dates and formatting columns. At the end of this lesson, you should be able to:

Information from the first exercise comes from the Eastern Tropical Pacific region with data provided by Juan José Alvarado & Jorge Cortés of the Universidad de Costa Rica. The data illustrate some common issues encountered when dealing with dates and other column formats.

[The code for this example can be found here: creation_code/exercises/formatting/create_percent_cover_acosa.R]

Cleaning up dates

The first thing we should notice when inspecting the data object of percent cover data is that the dates (i.e. “Fecha”) looks like a number, but is actually recognised as a character:

> percent_cover_acosa
# A tibble: 2,871 x 25
   `# sitio` `Conservation Ar… Locality  Site   Diver      Transect Date  Keypuncher
       <dbl> <chr>             <chr>     <chr>  <chr>         <dbl> <chr> <chr>
 1         1 ACOSA             Dominical El Ar… Carolina …        1 42780 Carolina Sher…
 2         1 ACOSA             Dominical El Ar… Carolina …        1 42780 Carolina Sher…
 3         1 ACOSA             Dominical El Ar… Carolina …        1 42780 Carolina Sher…
 4         1 ACOSA             Dominical El Ar… Carolina …        1 42780 Carolina Sher…
 5         1 ACOSA             Dominical El Ar… Carolina …        2 42780 Carolina Sher…
 6         1 ACOSA             Dominical El Ar… Carolina …        1 42780 Carolina Sher…
 7         1 ACOSA             Dominical El Ar… Carolina …        1 42780 Carolina Sher…
 8         1 ACOSA             Dominical El Ar… Carolina …        1 42780 Carolina Sher…
 9         1 ACOSA             Dominical El Ar… Carolina …        1 42780 Carolina Sher…
10         1 ACOSA             Dominical El Ar… Carolina …        1 42780 Carolina Sher…
# … with 2,861 more rows, and 17 more variables: Keypunch date <chr>, Depth <dbl>,
#   Depth category <chr>, Code <chr>, 1 <dbl>, 2 <dbl>, 3 <dbl>, 4 <dbl>, 5 <dbl>,
#   6 <dbl>, 7 <dbl>, 8 <dbl>, 9 <dbl>, 10 <dbl>, Total <dbl>, Average <dbl>,
#   Percentage <dbl>

This is because Excel has its own way of treating dates and often defaults to a ambiguous format (e.g. 06/07/11, where it is unclear what is the year, month, date). These need to be corrected in R to provide a standard unambiguous date format (i.e. yyyy-mm-dd).

This is done by using mutate() to first transform the column “Fecha” to as.numeric() and then to a date using as.Date(). In converting to a date, R needs to know the origin for calculating the date. For Windows version of Excel, this date is 1899-12-30.

  # set fecha to date
    percent_cover_acosa %<>%
      mutate(Date = Date %>% as.numeric() %>% as.Date(origin = "1899-12-30"))

We can now see that “Date” is now in the correct date format:

  percent_cover_acosa
# A tibble: 2,871 x 25
   `# sitio` `Conservation Ar… Locality  Site  Diver    Transect Date       Keypuncher
       <dbl> <chr>             <chr>     <chr> <chr>       <dbl> <date>     <chr>
 1         1 ACOSA             Dominical El A… Carolin…        1 2017-02-14 Carolina Sh…
 2         1 ACOSA             Dominical El A… Carolin…        1 2017-02-14 Carolina Sh…
 3         1 ACOSA             Dominical El A… Carolin…        1 2017-02-14 Carolina Sh…
 4         1 ACOSA             Dominical El A… Carolin…        1 2017-02-14 Carolina Sh…
 5         1 ACOSA             Dominical El A… Carolin…        2 2017-02-14 Carolina Sh…
 6         1 ACOSA             Dominical El A… Carolin…        1 2017-02-14 Carolina Sh…
 7         1 ACOSA             Dominical El A… Carolin…        1 2017-02-14 Carolina Sh…
 8         1 ACOSA             Dominical El A… Carolin…        1 2017-02-14 Carolina Sh…
 9         1 ACOSA             Dominical El A… Carolin…        1 2017-02-14 Carolina Sh…
10         1 ACOSA             Dominical El A… Carolin…        1 2017-02-14 Carolina Sh…
# … with 2,861 more rows, and 17 more variables: Keypunch date <chr>, Depth <dbl>,
#   Depth category <chr>, Code <chr>, 1 <dbl>, 2 <dbl>, 3 <dbl>, 4 <dbl>, 5 <dbl>,
#   6 <dbl>, 7 <dbl>, 8 <dbl>, 9 <dbl>, 10 <dbl>, Total <dbl>, Average <dbl>,
#   Percentage <dbl>

You may have also noticed that there was a warning message produced when using as.Date():

   # set fecha to date
     percent_cover_acosa %<>%
       mutate(Date = Date %>% as.numeric() %>% as.Date(origin = "1899-12-30"))
Warning message:
Problem with `mutate()` column `Date`.
ℹ `Date = Date %>% as.numeric() %>% as.Date(origin = "1899-12-30")`.
ℹ NAs introduced by coercion

We will come back to this as part of the Homework for this module.

Next Steps

Having successfully converted “Fecha” to date, we now need to join the taxonomic information to the percent cover information. For this, we continue on the next page.