If you would like to return to information from the previous section, please click here
When collating coordinate information from multiple sites across
years, there is often a wide variety of how the coordinates were
recorded. For example, the data could be in decimal degrees; degrees,
decimal minutes; degrees, minutes, seconds; UTM or other coordinate
system. The way the hemispheres are noted (e.g. 178° W or
-178) or can be on a 360° notation.
Bringing these different formats can be further complicated by
different symbols used in different operating systems or versions of
operating systems. For example, the degree symbol can look like
º or ° and quotation marks can also vary:
” or ".
This wiki page provides an overview of ways to “harmonise” these different formats and export clean coordinates for mapping.
Code for this example can be found here:
creation_code/examples/mapping/clean_site_coordinates.R
After importing some coordinate data and verifying that there are a number of different formats:
>       paste0(data_locale, data_file) %>%
+       read_excel()
# A tibble: 440 x 4
   Sitio        Isla     Latitud   Longitud
   <chr>        <chr>    <chr>     <chr>
 1 El Trompo    Española 01°24.481 089°39.372
 2 El Trompo    Española 01°24.481 089°39.372
 3 El Trompo    Española 01°24.481 089°39.372
 4 El Trompo    Española 01°24.597 089°38.410
 5 El Trompo    Española 01°24.597 089°38.410
 6 El Trompo    Española 01°24.597 089°38.410
 7 La Herradura Española 01°24.813 089°39.703
 8 La Herradura Española 01°24.813 089°39.703
 9 La Herradura Española 01°24.813 089°39.703
10 La Herradura Española 01°23.817 089°37.434
# … with 430 more rowsTo begin, we can see that Latitud and
Longitud are recognised as ‘characters’. We can convert
Latitud and Longitud to numeric values, which
creates NA values for character values and
filter() these values. This creates a new object with
coordinates that have decimal degrees:
  # filter numerics
    coordinates_decimaldeg <-
      site_coordinates %>%
        mutate(Latitud  = Latitud  %>% as.numeric(),
               Longitud = Longitud %>% as.numeric()) %>%
        dplyr::filter(!Latitud %>% is.na())
# # A tibble: 204 x 5
   # Sitio        Isla       Latitud Longitud    id
   # <chr>        <chr>        <dbl>    <dbl> <int>
 # 1 Cabo Douglas Fernandina  -0.305    -91.7    23
 # 2 Cabo Douglas Fernandina  -0.305    -91.7    24
 # 3 Cabo Douglas Fernandina  -0.305    -91.7    25
 # 4 Cabo Douglas Fernandina  -0.305    -91.7    26
 # 5 Cabo Douglas Fernandina  -0.299    -91.6    31
 # 6 Cabo Douglas Fernandina  -0.299    -91.6    32
 # 7 Cabo Douglas Fernandina  -0.299    -91.6    33
 # 8 Cabo Douglas Fernandina  -0.299    -91.6    34
 # 9 Cabo Douglas Fernandina  -0.299    -91.6    35
# 10 Cabo Douglas Fernandina  -0.299    -91.6    36
# # … with 194 more rowsWe will keep this object to bind with other formats, once they are harmonised.
Using the id values of the decimal degrees, we
will filter the rows with degree symbols (i.e. character values) and
then filter() again for the “wide” degree symbol
º:
 ## -- clean up decimal degrees -- ##
  # get sequence of decimal degrees
    ids_to_exclude <-
      coordinates_decimaldeg$id %>% unique()
  # separate formats
    coordinates_decimalmin <-
      site_coordinates %>%
        dplyr::filter(!id %in% ids_to_exclude)
  # detect degree symbols
    coordinates_minseconds <-
      coordinates_decimalmin %>%
        dplyr::filter(Latitud %>% str_detect("º"))
# A tibble: 7 x 5
  Sitio             Isla     Latitud        Longitud          id
  <chr>             <chr>    <chr>          <chr>          <int>
1 Piedras Amarillas Floreana "01º18'50,1\"" "90º22'47,2\""   107
2 Piedras Amarillas Floreana "01º18'50,1\"" "90º22'47,2\""   108
3 Piedras Amarillas Floreana "01º18'50,1\"" "90º22'47,2\""   109
4 Piedras Amarillas Floreana "01º18'50,1\"" "90º22'47,2\""   110
5 Piedras Amarillas Floreana "01º18'45,3\"" "90º22'39,6\""   111
6 Piedras Amarillas Floreana "01º18'45,3\"" "90º22'39,6\""   112
7 Piedras Amarillas Floreana "01º18'45,3\"" "90º22'39,6\""   113We now have an object which has the degree, minute, second
format. As some of these coordinates come from a Latin language setting,
we can see that the decimal marking is a , instead of a
.. Although the majority of the WIO will be working in an
English locale, we will use this example to learn cleaning techiques
that can be applied to other data grooming problems.
The next step is to use the separate() function to
separate the degrees from the minutes seconds and then
separate the minutes from the seconds. We will do this
for both Latitud and Longitud:
  # clean up minutes seconds
    coordinates_minseconds %<>%
      separate(Latitud,
               into = c("lat_deg", "lat_minsec"),
               sep  = "º") %>%
      separate(Longitud,
               into = c("lon_deg", "lon_minsec"),
               sep  = "º") %>%
      separate(lat_minsec,
               into = c("lat_min", "lat_sec"),
               sep  = "'") %>%
      separate(lon_minsec,
               into = c("lon_min", "lon_sec"),
               sep  = "'")The separate() function allows users to define the
separator (i.e. sep = "º") and name the columns for the
separation (i.e. into = c("lat_deg", "lat_minsec")).
We now need to convert the , to . and
remove the quotation symbols:
    coordinates_minseconds %<>%
      mutate(lat_sec = lat_sec %>% str_replace('"', ""),
             lon_sec = lon_sec %>% str_replace('"', ""),
             lat_sec = lat_sec %>% str_replace(",", "."),
             lon_sec = lon_sec %>% str_replace(",", "."))As these columns are still as character values, we need to convert them to numeric:
  # set to numeric
    coordinates_minseconds %<>%
      mutate(lat_deg = lat_deg %>% as.numeric(),
             lon_deg = lon_deg %>% as.numeric(),
             lat_min = lat_min %>% as.numeric(),
             lon_min = lon_min %>% as.numeric(),
             lat_sec = lat_sec %>% as.numeric(),
             lon_sec = lon_sec %>% as.numeric())We are almost there. Our object has degrees, minutes, and seconds in separate columns for Latitude and Longitude:
> coordinates_minseconds
# A tibble: 7 x 9
  Sitio             Isla     lat_deg lat_min lat_sec lon_deg lon_min lon_sec    id
  <chr>             <chr>      <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl> <int>
1 Piedras Amarillas Floreana       1      18    50.1      90      22    47.2   107
2 Piedras Amarillas Floreana       1      18    50.1      90      22    47.2   108
3 Piedras Amarillas Floreana       1      18    50.1      90      22    47.2   109
4 Piedras Amarillas Floreana       1      18    50.1      90      22    47.2   110
5 Piedras Amarillas Floreana       1      18    45.3      90      22    39.6   111
6 Piedras Amarillas Floreana       1      18    45.3      90      22    39.6   112
7 Piedras Amarillas Floreana       1      18    45.3      90      22    39.6   113
We will now use a function from the biogeo package to put everything into decimal degrees. But, before doing this, we must assign the hemisphere to the coordinates:
  # set hemisphere
    coordinates_minseconds %<>%
      mutate(lat_hem = ifelse(lat_deg < 1, "S", "N"),
             lon_hem = ifelse(lon_deg < 1, "W", "E"))
  # convert to decimal degrees
    coordinates_minseconds %<>%
      mutate(Latitud  = biogeo::dms2dd(lat_deg, lat_min, lat_sec, lat_hem),
             Longitud = biogeo::dms2dd(lon_deg, lon_min, lon_sec, lon_hem)) %>%
      dplyr::select(Sitio,
                    Isla,
                    Latitud,
                    Longitud,
                    id)
  # Sitio             Isla     Latitud Longitud    id
  # <chr>             <chr>      <dbl>    <dbl> <int>
# 1 Piedras Amarillas Floreana    1.31     90.4   107
# 2 Piedras Amarillas Floreana    1.31     90.4   108
# 3 Piedras Amarillas Floreana    1.31     90.4   109
# 4 Piedras Amarillas Floreana    1.31     90.4   110
# 5 Piedras Amarillas Floreana    1.31     90.4   111
# 6 Piedras Amarillas Floreana    1.31     90.4   112
# 7 Piedras Amarillas Floreana    1.31     90.4   113After harmonising the coordinates with decimal minutes, we can now combine the objects.
  # stack up
    site_coordinates <-
      coordinates_decimaldeg %>%
        bind_rows(coordinates_decimalmin) %>%
        bind_rows(coordinates_minseconds) # %>%
      # .$id %>% unique() %>% sort()  ## -- missing coordinates id 306 - 313 (6)Note that for this example, there are a number of sites without
coordinates (i.e. 6), and we can verify this by piping the
bind_rows() result and obtain the number of rows.
Lastly, we can export the data as a shapefile
(i.e. *.shp), Excel (i.e. *.xlsx) or
*.rda, depending on how we intend to use the data:
    # set to sf
      site_coordinates %>%
        dplyr::filter(!Latitud %>% is.na()) %>%
        st_as_sf(coords = c("Longitud", "Latitud"),
                 crs = 4326) %>%
        # st_transform(32715) %>%
        as("Spatial") %>%
        shapefile("data/examples/mapping/site_coordinates.shp")
   # coordinadas
     site_coordinates %>%
       write.xlsx(file = "data/examples/mapping/site_coordinates.xlsx")Note that we can also take the opportunity to transform the
coordinates to UTM using the st_transform() function. We
will cover these different formats with examples later in this
module.
Now that we have a command of the importing & cleaning of coordinates from field data, we can now go into some detail of importing and linking with other spatial data.