Previous steps

If you would like to return to information from the previous section, please click here

Context

When collating coordinate information from multiple sites across years, there is often a wide variety of how the coordinates were recorded. For example, the data could be in decimal degrees; degrees, decimal minutes; degrees, minutes, seconds; UTM or other coordinate system. The way the hemispheres are noted (e.g. 178° W or -178) or can be on a 360° notation.

Bringing these different formats can be further complicated by different symbols used in different operating systems or versions of operating systems. For example, the degree symbol can look like º or ° and quotation marks can also vary: or ".

This wiki page provides an overview of ways to “harmonise” these different formats and export clean coordinates for mapping.

Code for this example can be found here: creation_code/examples/mapping/clean_site_coordinates.R

Strategies for harmonising

After importing some coordinate data and verifying that there are a number of different formats:

>       paste0(data_locale, data_file) %>%
+       read_excel()
# A tibble: 440 x 4
   Sitio        Isla     Latitud   Longitud
   <chr>        <chr>    <chr>     <chr>
 1 El Trompo    Española 01°24.481 089°39.372
 2 El Trompo    Española 01°24.481 089°39.372
 3 El Trompo    Española 01°24.481 089°39.372
 4 El Trompo    Española 01°24.597 089°38.410
 5 El Trompo    Española 01°24.597 089°38.410
 6 El Trompo    Española 01°24.597 089°38.410
 7 La Herradura Española 01°24.813 089°39.703
 8 La Herradura Española 01°24.813 089°39.703
 9 La Herradura Española 01°24.813 089°39.703
10 La Herradura Española 01°23.817 089°37.434
# … with 430 more rows

To begin, we can see that Latitud and Longitud are recognised as ‘characters’. We can convert Latitud and Longitud to numeric values, which creates NA values for character values and filter() these values. This creates a new object with coordinates that have decimal degrees:

  # filter numerics
    coordinates_decimaldeg <-
      site_coordinates %>%
        mutate(Latitud  = Latitud  %>% as.numeric(),
               Longitud = Longitud %>% as.numeric()) %>%
        dplyr::filter(!Latitud %>% is.na())
# # A tibble: 204 x 5
   # Sitio        Isla       Latitud Longitud    id
   # <chr>        <chr>        <dbl>    <dbl> <int>
 # 1 Cabo Douglas Fernandina  -0.305    -91.7    23
 # 2 Cabo Douglas Fernandina  -0.305    -91.7    24
 # 3 Cabo Douglas Fernandina  -0.305    -91.7    25
 # 4 Cabo Douglas Fernandina  -0.305    -91.7    26
 # 5 Cabo Douglas Fernandina  -0.299    -91.6    31
 # 6 Cabo Douglas Fernandina  -0.299    -91.6    32
 # 7 Cabo Douglas Fernandina  -0.299    -91.6    33
 # 8 Cabo Douglas Fernandina  -0.299    -91.6    34
 # 9 Cabo Douglas Fernandina  -0.299    -91.6    35
# 10 Cabo Douglas Fernandina  -0.299    -91.6    36
# # … with 194 more rows

We will keep this object to bind with other formats, once they are harmonised.

Using the id values of the decimal degrees, we will filter the rows with degree symbols (i.e. character values) and then filter() again for the “wide” degree symbol º:

 ## -- clean up decimal degrees -- ##
  # get sequence of decimal degrees
    ids_to_exclude <-
      coordinates_decimaldeg$id %>% unique()

  # separate formats
    coordinates_decimalmin <-
      site_coordinates %>%
        dplyr::filter(!id %in% ids_to_exclude)

  # detect degree symbols
    coordinates_minseconds <-
      coordinates_decimalmin %>%
        dplyr::filter(Latitud %>% str_detect("º"))
# A tibble: 7 x 5
  Sitio             Isla     Latitud        Longitud          id
  <chr>             <chr>    <chr>          <chr>          <int>
1 Piedras Amarillas Floreana "01º18'50,1\"" "90º22'47,2\""   107
2 Piedras Amarillas Floreana "01º18'50,1\"" "90º22'47,2\""   108
3 Piedras Amarillas Floreana "01º18'50,1\"" "90º22'47,2\""   109
4 Piedras Amarillas Floreana "01º18'50,1\"" "90º22'47,2\""   110
5 Piedras Amarillas Floreana "01º18'45,3\"" "90º22'39,6\""   111
6 Piedras Amarillas Floreana "01º18'45,3\"" "90º22'39,6\""   112
7 Piedras Amarillas Floreana "01º18'45,3\"" "90º22'39,6\""   113

We now have an object which has the degree, minute, second format. As some of these coordinates come from a Latin language setting, we can see that the decimal marking is a , instead of a .. Although the majority of the WIO will be working in an English locale, we will use this example to learn cleaning techiques that can be applied to other data grooming problems.

The next step is to use the separate() function to separate the degrees from the minutes seconds and then separate the minutes from the seconds. We will do this for both Latitud and Longitud:

  # clean up minutes seconds
    coordinates_minseconds %<>%
      separate(Latitud,
               into = c("lat_deg", "lat_minsec"),
               sep  = "º") %>%
      separate(Longitud,
               into = c("lon_deg", "lon_minsec"),
               sep  = "º") %>%
      separate(lat_minsec,
               into = c("lat_min", "lat_sec"),
               sep  = "'") %>%
      separate(lon_minsec,
               into = c("lon_min", "lon_sec"),
               sep  = "'")

The separate() function allows users to define the separator (i.e. sep = "º") and name the columns for the separation (i.e. into = c("lat_deg", "lat_minsec")).

We now need to convert the , to . and remove the quotation symbols:

    coordinates_minseconds %<>%
      mutate(lat_sec = lat_sec %>% str_replace('"', ""),
             lon_sec = lon_sec %>% str_replace('"', ""),
             lat_sec = lat_sec %>% str_replace(",", "."),
             lon_sec = lon_sec %>% str_replace(",", "."))

As these columns are still as character values, we need to convert them to numeric:

  # set to numeric
    coordinates_minseconds %<>%
      mutate(lat_deg = lat_deg %>% as.numeric(),
             lon_deg = lon_deg %>% as.numeric(),
             lat_min = lat_min %>% as.numeric(),
             lon_min = lon_min %>% as.numeric(),
             lat_sec = lat_sec %>% as.numeric(),
             lon_sec = lon_sec %>% as.numeric())

We are almost there. Our object has degrees, minutes, and seconds in separate columns for Latitude and Longitude:

> coordinates_minseconds
# A tibble: 7 x 9
  Sitio             Isla     lat_deg lat_min lat_sec lon_deg lon_min lon_sec    id
  <chr>             <chr>      <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl> <int>
1 Piedras Amarillas Floreana       1      18    50.1      90      22    47.2   107
2 Piedras Amarillas Floreana       1      18    50.1      90      22    47.2   108
3 Piedras Amarillas Floreana       1      18    50.1      90      22    47.2   109
4 Piedras Amarillas Floreana       1      18    50.1      90      22    47.2   110
5 Piedras Amarillas Floreana       1      18    45.3      90      22    39.6   111
6 Piedras Amarillas Floreana       1      18    45.3      90      22    39.6   112
7 Piedras Amarillas Floreana       1      18    45.3      90      22    39.6   113

We will now use a function from the biogeo package to put everything into decimal degrees. But, before doing this, we must assign the hemisphere to the coordinates:

  # set hemisphere
    coordinates_minseconds %<>%
      mutate(lat_hem = ifelse(lat_deg < 1, "S", "N"),
             lon_hem = ifelse(lon_deg < 1, "W", "E"))

  # convert to decimal degrees
    coordinates_minseconds %<>%
      mutate(Latitud  = biogeo::dms2dd(lat_deg, lat_min, lat_sec, lat_hem),
             Longitud = biogeo::dms2dd(lon_deg, lon_min, lon_sec, lon_hem)) %>%
      dplyr::select(Sitio,
                    Isla,
                    Latitud,
                    Longitud,
                    id)
  # Sitio             Isla     Latitud Longitud    id
  # <chr>             <chr>      <dbl>    <dbl> <int>
# 1 Piedras Amarillas Floreana    1.31     90.4   107
# 2 Piedras Amarillas Floreana    1.31     90.4   108
# 3 Piedras Amarillas Floreana    1.31     90.4   109
# 4 Piedras Amarillas Floreana    1.31     90.4   110
# 5 Piedras Amarillas Floreana    1.31     90.4   111
# 6 Piedras Amarillas Floreana    1.31     90.4   112
# 7 Piedras Amarillas Floreana    1.31     90.4   113

Binding different formats & exporting

After harmonising the coordinates with decimal minutes, we can now combine the objects.

  # stack up
    site_coordinates <-
      coordinates_decimaldeg %>%
        bind_rows(coordinates_decimalmin) %>%
        bind_rows(coordinates_minseconds) # %>%
      # .$id %>% unique() %>% sort()  ## -- missing coordinates id 306 - 313 (6)

Note that for this example, there are a number of sites without coordinates (i.e. 6), and we can verify this by piping the bind_rows() result and obtain the number of rows.

Lastly, we can export the data as a shapefile (i.e. *.shp), Excel (i.e. *.xlsx) or *.rda, depending on how we intend to use the data:

    # set to sf
      site_coordinates %>%
        dplyr::filter(!Latitud %>% is.na()) %>%
        st_as_sf(coords = c("Longitud", "Latitud"),
                 crs = 4326) %>%
        # st_transform(32715) %>%
        as("Spatial") %>%
        shapefile("data/examples/mapping/site_coordinates.shp")

   # coordinadas
     site_coordinates %>%
       write.xlsx(file = "data/examples/mapping/site_coordinates.xlsx")

Note that we can also take the opportunity to transform the coordinates to UTM using the st_transform() function. We will cover these different formats with examples later in this module.

Next steps

Now that we have a command of the importing & cleaning of coordinates from field data, we can now go into some detail of importing and linking with other spatial data.