prepare_data.RmdIn this article you will learn how to prepare the data to train models using SDMtune. We will use the virtualSp dataset included in the package and environmental predictors from the WorldClim dataset.
For the analysis we use the climate data of WorldClim version 1.4 (Hijmans et al. 2005) and the terrestrial ecoregions from WWF (Olson et al. 2001) included in the dismo package:
files <- list.files(path = file.path(system.file(package = "dismo"), "ex"), pattern = "grd", full.names = TRUE)We convert the files in a raster stack object that will be used later in the analysis:
There are nine environmental variables, eight continuous and one categorical:
We can plot bio1 using the gplot function from the rasterVis package:
gplot(predictors$bio1) +
geom_tile(aes(fill = value)) +
coord_equal() +
scale_fill_gradientn(colours = c("#2c7bb6", "#abd9e9", "#ffffbf", "#fdae61", "#d7191c"),
na.value = "transparent",
name = "°C x 10") +
labs(title = "Annual Mean Temperature",
x = "longitude",
y = "latitude") +
scale_x_continuous(expand = c(0, 0)) +
scale_y_continuous(expand = c(0, 0)) +
theme_minimal() +
theme(plot.title = element_text(hjust = 0.5),
axis.ticks.x = element_blank(),
axis.ticks.y = element_blank())
Let’s load the SDMtune package:
library(SDMtune)
#>
#> _____ ____ __ ___ __
#> / ___/ / __ \ / |/ // /_ __ __ ____ ___
#> \__ \ / / / // /|_/ // __// / / // __ \ / _ \
#> ___/ // /_/ // / / // /_ / /_/ // / / // __/
#> /____//_____//_/ /_/ \__/ \__,_//_/ /_/ \___/ version 0.2.0
#>
#> To cite this package in publications type: citation("SDMtune").For demonstrating how to use SDMtune we use the random generated virtual species virtualSp dataset included in the package. The dataset contains 400 coordinates for presence and 5000 for background locations.
We select the the first two columns that contain the coordinates of the locations:
Plot the study area together with the presence locations:
ggplot(map_data("world"), aes(long, lat)) +
geom_polygon(aes(group = group), fill = "grey95", color = "gray40", size = 0.2) +
geom_jitter(data = p_coords, aes(x = x, y = y), color = "red",
alpha = 0.4, size = 1) +
labs(x = "longitude", y = "latitude") +
theme_minimal() +
theme(legend.position = "none") +
coord_fixed() +
scale_x_continuous(limits = c(-125, -32)) +
scale_y_continuous(limits = c(-56, 40))
To plot the background locations run the following code:
ggplot(map_data("world"), aes(long, lat)) +
geom_polygon(aes(group = group), fill = "grey95", color = "gray40", size = 0.2) +
geom_jitter(data = as.data.frame(bg_coords), aes(x = x, y = y),
color = "blue", alpha = 0.4, size = 0.5) +
labs(x = "longitude", y = "latitude") +
theme_minimal() +
theme(legend.position = "none") +
coord_fixed() +
scale_x_continuous(limits = c(-125, -32)) +
scale_y_continuous(limits = c(-56, 40))
Before training a model we have to prepare the data in the correct format. The prepareSWD() function creates an SWD() object that stores the species name, the coordinates of the species at presence and absence/background locations and the value of the environmental variables at the locations. The argument categorical indicates which environmental variables are categorical. In our example biome is categorical (we can pass a vector if we have more than one categorical environmental variable). The function extracts the value of the environmental variables for each location and excludes those locations that have NA value for at least one environmental variable.
Let’s have a look at the created SWD() object:
data
#> Object of class SWD
#>
#> Species: Virtual species
#> Presence locations: 400
#> Absence locations: 5000
#>
#> Variables:
#> ---------
#> Continuous: bio1 bio12 bio16 bio17 bio5 bio6 bio7 bio8
#> Categorical: biomeWhen we print an SWD() object we get a bunch of information:
The object contains four slots: @species, @coords @data and @pa. @pa contains a vector with 1 for presence and 0 for absence/background locations. To visualize the data we run:
head(data@data)
#> bio1 bio12 bio16 bio17 bio5 bio6 bio7 bio8 biome
#> 1 32 59 55 0 152 -118 270 66 10
#> 2 161 1427 426 275 323 -10 333 78 4
#> 3 156 977 341 137 342 -41 384 203 8
#> 4 119 975 305 177 313 -77 390 190 8
#> 5 112 633 268 42 333 -106 440 218 8
#> 6 92 196 64 37 254 -37 291 26 8We can visualize the coordinates with:
head(data@coords)
#> X Y
#> 1 -67.75 -21.75
#> 2 -86.75 33.75
#> 3 -96.25 35.75
#> 4 -89.25 39.25
#> 5 -98.75 39.75
#> 6 -68.75 -41.25or the name of the species with:
We can save the SWD() object in a .csv file using the function swd2csv() (the function saves the file in the working directory). There are two possibilities:
In this article you have learned:
stack object;gplot function included in the rasterVis package;ggplot and the maps packages;SWD() objects;SWD() object;SWD() object in a .csv file.Move on to the second article and learn how to train models using SDMtune.
Hijmans, Robert J., Susan E. Cameron, Juan L. Parra, Peter G. Jones, and Andy Jarvis. 2005. “Very high resolution interpolated climate surfaces for global land areas.” International Journal of Climatology 25 (15). Wiley-Blackwell: 1965–78. https://doi.org/10.1002/joc.1276.
Olson, David M., Eric Dinerstein, Eric D. Wikramanayake, Neil D. Burgess, George V. N. Powell, Emma C. Underwood, Jennifer A. D’amico, et al. 2001. “Terrestrial Ecoregions of the World: A New Map of Life on Earth.” BioScience 51 (11). Oxford University Press: 933. https://doi.org/10.1641/0006-3568(2001)051[0933:TEOTWA]2.0.CO;2.