Taxonomic standardisation with taxize

Previous steps

If you would like to return to information from the previous section, please click here.

Context

The compilation of regional data on the status & trends of coral reefs often involves a number of different taxomic levels, groupings and taxonomic standards. One way to help in standardising taxonomy for benthic classes (i.e. corals, macroalgae, sessile invertebrates) and fishes is to use taxonomic databases.

These databases, such as Catalogue of Life and WoRMS, provide higher level taxonomic information, authorities, and can be used to track changes in taxonomy over time.

This wiki page provides an overview of how to link with these taxonomic databases using the package taxize.

Installing Taxize

The package taxize is available on the ROpenSci platform and can be installed using devtools::install_github():

  # install from ropensci for taxonomic databases
    install.packages("devtools")
    devtools::install_github("ropensci/taxize")
    devtools::install_github("ropensci/taxizesoap")
    devtools::install_github("cran/XMLSchema")

The additional packages provide additional functionality for accessing the databases.

Retrieving a list of taxa

Most often, users will want to obtain a list of taxa (e.g. from a data object of monitoring data). This can help in summarising data at Family, Order or other higher taxonomic category where data were recorded at different taxonomic levels (e.g. species, genera).

To provide an example, we first create a list of taxa of interest:

  # create taxa of interest
    taxa_to_get <-
      c("Acanthastrea",
        "Acropora",
        "Astreopora",
        "Cespitularia",
        "Coscinaraea",
        "Cyphastrea",
        "Dendronephthya",
        "Diploastrea",
        "Dipsastraea",
        "Echinopora",
        "Favia",
        "Favites",
        "Fungia",
        "Galaxea",
        "Goniastrea",
        "Goniopora")

We then create an empty object to hold the results and loop through the list to extract the taxonomy:

  # create empty object to hold results
    wio_benthic_taxa_eol <- tibble()

  # loop to get taxa  # i=2  ## -- for testing -- ##
    for(i in 1:length(taxa_to_get)) {
    # for(i in c(1:36,
               # 38:length(taxa_to_get))) {

      # # get col ids
        # col_ids <-
          # # paste0(taxa_to_get[i]) %>%
            # get_eolid(# kingdom = "Animalia",
                      # sci_com = paste0(taxa_to_get[i]),
                      # rows = 1)

      # # convert to numeric
        # col_ids <- col_ids[1] %>% as.numeric()

      # get classification
          dat <-
            classification(sci_id = taxa_to_get[i],
                           # db     = "eol",
                         db     = "worms",
                           rows   = 1)

      # add identifier
        dat <-
          dat[[1]] %>%
          mutate(benthic_name = taxa_to_get[i])

      # harvest results
        wio_benthic_taxa_eol %<>%
          bind_rows(dat)
     }

Note that in this example, we are using the WoRMS database instead of the Catalogue of Life (i.e. db = "worms"). The rows = 1 parameter automatically selects the first entry in a list (i.e. for a genus like Pocillopora, there will be numerous entries related to individual species).

Cleaning up

To tidy the information extracted from the taxonomic databases, we will organise the information at taxonomic levels, and set to wide format (i.e. so taxonomic levels are in columns):

  # set ranks of interest
    ranks_of_interest <-
      c("Kingdom",
        "Phylum",
        "Class",
        "Subclass",
        "Order",
        "Suborder",
        "Family",
        "Genus")

  # filter ranks of interest
    wio_benthic_taxa_eol %<>%
      dplyr::filter(rank %in% ranks_of_interest)

  # set to wide
    wio_benthic_taxa_eol %<>%
      dplyr::select(name,
                    rank,
                    benthic_name) %>%
      spread(rank, name)

  # put in order
    wio_benthic_taxa_eol %<>%
      dplyr::select(Kingdom,
                    Phylum,
                    Class,
                    Subclass,
                    Order,
                    Suborder,
                    Family,
                    Genus,
                    # species,
                    benthic_name) %>%
      distinct()

Next steps

After saving the intermediate data object as a *.rda, these can be used to link to the main data object (e.g. conserving the name of the benthic_name column) in a separate script. This provides a separation of the extraction of the taxonomic information separate from other data grooming tasks.

Next, we will have a look at extracting additional information from external databases such as Fishbase, IUCN Redlist