If you would like to return to information from the previous section, please click here
When collating coordinate information from multiple sites across
years, there is often a wide variety of how the coordinates were
recorded. For example, the data could be in decimal degrees; degrees,
decimal minutes; degrees, minutes, seconds; UTM or other coordinate
system. The way the hemispheres are noted (e.g. 178° W
or
-178
) or can be on a 360° notation.
Bringing these different formats can be further complicated by
different symbols used in different operating systems or versions of
operating systems. For example, the degree symbol can look like
º
or °
and quotation marks can also vary:
”
or "
.
This wiki page provides an overview of ways to “harmonise” these different formats and export clean coordinates for mapping.
Code for this example can be found here:
creation_code/examples/mapping/clean_site_coordinates.R
After importing some coordinate data and verifying that there are a number of different formats:
> paste0(data_locale, data_file) %>%
+ read_excel()
# A tibble: 440 x 4
Sitio Isla Latitud Longitud
<chr> <chr> <chr> <chr>
1 El Trompo Española 01°24.481 089°39.372
2 El Trompo Española 01°24.481 089°39.372
3 El Trompo Española 01°24.481 089°39.372
4 El Trompo Española 01°24.597 089°38.410
5 El Trompo Española 01°24.597 089°38.410
6 El Trompo Española 01°24.597 089°38.410
7 La Herradura Española 01°24.813 089°39.703
8 La Herradura Española 01°24.813 089°39.703
9 La Herradura Española 01°24.813 089°39.703
10 La Herradura Española 01°23.817 089°37.434
# … with 430 more rows
To begin, we can see that Latitud
and
Longitud
are recognised as ‘characters’. We can convert
Latitud
and Longitud
to numeric values, which
creates NA
values for character values and
filter()
these values. This creates a new object with
coordinates that have decimal degrees:
# filter numerics
coordinates_decimaldeg <-
site_coordinates %>%
mutate(Latitud = Latitud %>% as.numeric(),
Longitud = Longitud %>% as.numeric()) %>%
dplyr::filter(!Latitud %>% is.na())
# # A tibble: 204 x 5
# Sitio Isla Latitud Longitud id
# <chr> <chr> <dbl> <dbl> <int>
# 1 Cabo Douglas Fernandina -0.305 -91.7 23
# 2 Cabo Douglas Fernandina -0.305 -91.7 24
# 3 Cabo Douglas Fernandina -0.305 -91.7 25
# 4 Cabo Douglas Fernandina -0.305 -91.7 26
# 5 Cabo Douglas Fernandina -0.299 -91.6 31
# 6 Cabo Douglas Fernandina -0.299 -91.6 32
# 7 Cabo Douglas Fernandina -0.299 -91.6 33
# 8 Cabo Douglas Fernandina -0.299 -91.6 34
# 9 Cabo Douglas Fernandina -0.299 -91.6 35
# 10 Cabo Douglas Fernandina -0.299 -91.6 36
# # … with 194 more rows
We will keep this object to bind with other formats, once they are harmonised.
Using the id
values of the decimal degrees, we
will filter the rows with degree symbols (i.e. character values) and
then filter()
again for the “wide” degree symbol
º
:
## -- clean up decimal degrees -- ##
# get sequence of decimal degrees
ids_to_exclude <-
coordinates_decimaldeg$id %>% unique()
# separate formats
coordinates_decimalmin <-
site_coordinates %>%
dplyr::filter(!id %in% ids_to_exclude)
# detect degree symbols
coordinates_minseconds <-
coordinates_decimalmin %>%
dplyr::filter(Latitud %>% str_detect("º"))
# A tibble: 7 x 5
Sitio Isla Latitud Longitud id
<chr> <chr> <chr> <chr> <int>
1 Piedras Amarillas Floreana "01º18'50,1\"" "90º22'47,2\"" 107
2 Piedras Amarillas Floreana "01º18'50,1\"" "90º22'47,2\"" 108
3 Piedras Amarillas Floreana "01º18'50,1\"" "90º22'47,2\"" 109
4 Piedras Amarillas Floreana "01º18'50,1\"" "90º22'47,2\"" 110
5 Piedras Amarillas Floreana "01º18'45,3\"" "90º22'39,6\"" 111
6 Piedras Amarillas Floreana "01º18'45,3\"" "90º22'39,6\"" 112
7 Piedras Amarillas Floreana "01º18'45,3\"" "90º22'39,6\"" 113
We now have an object which has the degree, minute, second
format. As some of these coordinates come from a Latin language setting,
we can see that the decimal marking is a ,
instead of a
.
. Although the majority of the WIO will be working in an
English locale, we will use this example to learn cleaning techiques
that can be applied to other data grooming problems.
The next step is to use the separate()
function to
separate the degrees from the minutes seconds and then
separate the minutes from the seconds. We will do this
for both Latitud
and Longitud
:
# clean up minutes seconds
coordinates_minseconds %<>%
separate(Latitud,
into = c("lat_deg", "lat_minsec"),
sep = "º") %>%
separate(Longitud,
into = c("lon_deg", "lon_minsec"),
sep = "º") %>%
separate(lat_minsec,
into = c("lat_min", "lat_sec"),
sep = "'") %>%
separate(lon_minsec,
into = c("lon_min", "lon_sec"),
sep = "'")
The separate()
function allows users to define the
separator (i.e. sep = "º"
) and name the columns for the
separation (i.e. into = c("lat_deg", "lat_minsec")
).
We now need to convert the ,
to .
and
remove the quotation symbols:
coordinates_minseconds %<>%
mutate(lat_sec = lat_sec %>% str_replace('"', ""),
lon_sec = lon_sec %>% str_replace('"', ""),
lat_sec = lat_sec %>% str_replace(",", "."),
lon_sec = lon_sec %>% str_replace(",", "."))
As these columns are still as character values, we need to convert them to numeric:
# set to numeric
coordinates_minseconds %<>%
mutate(lat_deg = lat_deg %>% as.numeric(),
lon_deg = lon_deg %>% as.numeric(),
lat_min = lat_min %>% as.numeric(),
lon_min = lon_min %>% as.numeric(),
lat_sec = lat_sec %>% as.numeric(),
lon_sec = lon_sec %>% as.numeric())
We are almost there. Our object has degrees, minutes, and seconds in separate columns for Latitude and Longitude:
> coordinates_minseconds
# A tibble: 7 x 9
Sitio Isla lat_deg lat_min lat_sec lon_deg lon_min lon_sec id
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
1 Piedras Amarillas Floreana 1 18 50.1 90 22 47.2 107
2 Piedras Amarillas Floreana 1 18 50.1 90 22 47.2 108
3 Piedras Amarillas Floreana 1 18 50.1 90 22 47.2 109
4 Piedras Amarillas Floreana 1 18 50.1 90 22 47.2 110
5 Piedras Amarillas Floreana 1 18 45.3 90 22 39.6 111
6 Piedras Amarillas Floreana 1 18 45.3 90 22 39.6 112
7 Piedras Amarillas Floreana 1 18 45.3 90 22 39.6 113
We will now use a function from the biogeo package to put everything into decimal degrees. But, before doing this, we must assign the hemisphere to the coordinates:
# set hemisphere
coordinates_minseconds %<>%
mutate(lat_hem = ifelse(lat_deg < 1, "S", "N"),
lon_hem = ifelse(lon_deg < 1, "W", "E"))
# convert to decimal degrees
coordinates_minseconds %<>%
mutate(Latitud = biogeo::dms2dd(lat_deg, lat_min, lat_sec, lat_hem),
Longitud = biogeo::dms2dd(lon_deg, lon_min, lon_sec, lon_hem)) %>%
dplyr::select(Sitio,
Isla,
Latitud,
Longitud,
id)
# Sitio Isla Latitud Longitud id
# <chr> <chr> <dbl> <dbl> <int>
# 1 Piedras Amarillas Floreana 1.31 90.4 107
# 2 Piedras Amarillas Floreana 1.31 90.4 108
# 3 Piedras Amarillas Floreana 1.31 90.4 109
# 4 Piedras Amarillas Floreana 1.31 90.4 110
# 5 Piedras Amarillas Floreana 1.31 90.4 111
# 6 Piedras Amarillas Floreana 1.31 90.4 112
# 7 Piedras Amarillas Floreana 1.31 90.4 113
After harmonising the coordinates with decimal minutes, we can now combine the objects.
# stack up
site_coordinates <-
coordinates_decimaldeg %>%
bind_rows(coordinates_decimalmin) %>%
bind_rows(coordinates_minseconds) # %>%
# .$id %>% unique() %>% sort() ## -- missing coordinates id 306 - 313 (6)
Note that for this example, there are a number of sites without
coordinates (i.e. 6
), and we can verify this by piping the
bind_rows()
result and obtain the number of rows.
Lastly, we can export the data as a shapefile
(i.e. *.shp
), Excel (i.e. *.xlsx
) or
*.rda
, depending on how we intend to use the data:
# set to sf
site_coordinates %>%
dplyr::filter(!Latitud %>% is.na()) %>%
st_as_sf(coords = c("Longitud", "Latitud"),
crs = 4326) %>%
# st_transform(32715) %>%
as("Spatial") %>%
shapefile("data/examples/mapping/site_coordinates.shp")
# coordinadas
site_coordinates %>%
write.xlsx(file = "data/examples/mapping/site_coordinates.xlsx")
Note that we can also take the opportunity to transform the
coordinates to UTM using the st_transform()
function. We
will cover these different formats with examples later in this
module.
Now that we have a command of the importing & cleaning of coordinates from field data, we can now go into some detail of importing and linking with other spatial data.