Train a model using one of the following methods: Artificial Neural Networks, Boosted Regression Trees, Maxent, Maxnet or Random Forest.
train(method, data, folds = NULL, verbose = TRUE, ...)
method | character or character vector. Method used to train the model, possible values are "ANN", "BRT", "Maxent", "Maxnet" or "RF", see details. |
---|---|
data | SWD object with presence and absence/background locations. |
folds | list. Output of the function randomFolds or folds object
created with other packages, see details, default is |
verbose | logical, if |
... | Arguments passed to the relative method, see details. |
An SDMmodel or SDMmodelCV or a list of model objects.
For the ANN method possible arguments are (for more details see nnet):
size: integer. Number of the units in the hidden layer.
decay numeric. Weight decay, default is 0.
rang numeric. Initial random weights, default is 0.7.
maxit integer. Maximum number of iterations, default is 100.
For the BRT method possible arguments are (for more details see gbm):
distribution: character. Name of the distribution to use, default is "bernoulli".
n.trees: integer. Maximum number of tree to grow, default is 100.
interaction.depth: integer. Maximum depth of each tree, default is 1.
shrinkage: numeric. The shrinkage parameter, default is 0.1.
bag.fraction: numeric. Random fraction of data used in the tree expansion, default is 0.5.
For the RF method the model is trained as classification. Possible arguments are (for more details see randomForest):
mtry: integer. Number of variable randomly sampled at each split,
default is floor(sqrt(number of variables))
.
ntree: integer. Number of tree to grow, default is 500.
nodesize: integer. Minimum size of terminal nodes, default is 1.
Maxent models are trained using the arguments
"removeduplicates=false"
and "addsamplestobackground=false"
.
Use the function thinData to remove duplicates and the function
addSamplesToBg to add presence locations to background locations. For
the Maxent method, possible arguments are:
reg: numeric. The value of the regularization multiplier, default is 1.
fc: character. The value of the feature classes, possible values are combinations of "l", "q", "p", "h" and "t", default is "lqph".
iter: numeric. Number of iterations used by the MaxEnt algorithm, default is 500.
Maxnet models are trained using the argument
"addsamplestobackground = FALSE"
, use the function addSamplesToBg
to add presence locations to background locations. For the Maxnet method,
possible arguments are (for more details see maxnet):
reg: numeric. The value of the regularization intensity, default is 1.
fc: character. The value of the feature classes, possible values are combinations of "l", "q", "p", "h" and "t", default is "lqph".
The folds argument accepts also objects created with other packages: ENMeval or blockCV. In this case the function converts internally the folds into a format valid for SDMtune.
When multiple methods are given as method
argument, the function returns a
named list of model object, with the name corresponding to the used method,
see examples.
Venables, W. N. & Ripley, B. D. (2002) Modern Applied Statistics with S. Fourth Edition. Springer, New York. ISBN 0-387-95457-0.
Brandon Greenwell, Bradley Boehmke, Jay Cunningham and GBM Developers (2019). gbm: Generalized Boosted Regression Models. https://CRAN.R-project.org/package=gbm.
A. Liaw and M. Wiener (2002). Classification and Regression by randomForest. R News 2(3), 18--22.
Hijmans, Robert J., Steven Phillips, John Leathwick, and Jane Elith. 2017. dismo: Species Distribution Modeling. https://cran.r-project.org/package=dismo.
Steven Phillips (2017). maxnet: Fitting 'Maxent' Species Distribution Models with 'glmnet'. https://CRAN.R-project.org/package=maxnet.
Muscarella, R., Galante, P.J., Soley-Guardia, M., Boria, R.A., Kass, J., Uriarte, M. and R.P. Anderson (2014). ENMeval: An R package for conducting spatially independent evaluations and estimating optimal model complexity for ecological niche models. Methods in Ecology and Evolution.
Roozbeh Valavi, Jane Elith, José Lahoz-Monfort and Gurutzeta Guillera-Arroita (2018). blockCV: Spatial and environmental blocking for k-fold cross-validation. https://github.com/rvalavi/blockCV.
Sergio Vignali
# \donttest{ # Acquire environmental variables files <- list.files(path = file.path(system.file(package = "dismo"), "ex"), pattern = "grd", full.names = TRUE) predictors <- raster::stack(files) # Prepare presence and background locations p_coords <- virtualSp$presence bg_coords <- virtualSp$background # Create SWD object data <- prepareSWD(species = "Virtual species", p = p_coords, a = bg_coords, env = predictors, categorical = "biome")#>#>## Train a Maxent model # The next line checks if Maxent is correctly configured but you don't need # to run it in your script if (dismo::maxent(silent = TRUE)) { model <- train(method = "Maxent", data = data, fc = "l", reg = 1.5, iter = 700) # Add samples to background. This should be done preparing the data before # training the model without using data <- addSamplesToBg(data) model <- train("Maxent", data = data) }#> This is MaxEnt version 3.4.1 #> This is MaxEnt version 3.4.1## Train a Maxnet model model <- train(method = "Maxnet", data = data, fc = "lq", reg = 1.5) ## Cross Validation # Create 4 random folds splitting only the presence data folds <- randomFolds(data, k = 4, only_presence = TRUE) model <- train(method = "Maxnet", data = data, fc = "l", reg = 0.8, folds = folds) if (FALSE) { # Run only if you have the package ENMeval installed ## Block partition using the ENMeval package require(ENMeval) block_folds <- get.block(occ = data@coords[data@pa == 1, ], bg.coords = data@coords[data@pa == 0, ]) model <- train(method = "Maxnet", data = data, fc = "l", reg = 0.8, folds = block_folds) ## Checkerboard1 partition using the ENMeval package cb_folds <- get.checkerboard1(occ = data@coords[data@pa == 1, ], env = predictors, bg.coords = data@coords[data@pa == 0, ], aggregation.factor = 4) model <- train(method = "Maxnet", data = data, fc = "l", reg = 0.8, folds = cb_folds) ## Environmental block using the blockCV package # Run only if you have the package blockCV require(blockCV) # Create spatial points data frame library(raster) sp_df <- SpatialPointsDataFrame(data@coords, data = as.data.frame(data@pa), proj4string = crs(predictors)) e_folds <- envBlock(rasterLayer = predictors, speciesData = sp_df, species = "data@pa", k = 4, standardization = "standard", rasterBlock = FALSE) model <- train(method = "Maxnet", data = data, fc = "l", reg = 0.8, folds = e_folds) } ## Train presence absence models # Prepare presence and absence locations p_coords <- virtualSp$presence a_coords <- virtualSp$absence # Create SWD object data <- prepareSWD(species = "Virtual species", p = p_coords, a = a_coords, env = predictors[[1:5]])#>#>## Train an Artificial Neural Network model model <- train("ANN", data = data, size = 10) ## Train a Random Forest model model <- train("RF", data = data, ntree = 300) ## Train a Boosted Regression Tree model model <- train("BRT", data = data, n.trees = 300, shrinkage = 0.001) ## Multiple methods trained together with default arguments output <- train(method = c("ANN", "BRT", "RF"), data = data, size = 10) output$ANN#> Object of class SDMmodel #> Method: ANN #> #> Species: Virtual species #> Presence locations: 400 #> Absence locations: 300 #> #> Model configurations: #> -------------------- #> size: 10 #> decay: 0 #> rang: 0.7 #> maxit: 100 #> #> Variables: #> --------- #> Continuous: bio1 bio12 bio16 bio17 bio5 #> Categorical: NAoutput$BRT#> Object of class SDMmodel #> Method: BRT #> #> Species: Virtual species #> Presence locations: 400 #> Absence locations: 300 #> #> Model configurations: #> -------------------- #> distribution: bernoulli #> n.trees: 100 #> interaction.depth: 1 #> shrinkage: 0.1 #> bag.fraction: 0.5 #> #> Variables: #> --------- #> Continuous: bio1 bio12 bio16 bio17 bio5 #> Categorical: NAoutput$RF#> Object of class SDMmodel #> Method: RF #> #> Species: Virtual species #> Presence locations: 400 #> Absence locations: 300 #> #> Model configurations: #> -------------------- #> mtry: 2 #> ntree: 500 #> nodesize: 1 #> #> Variables: #> --------- #> Continuous: bio1 bio12 bio16 bio17 bio5 #> Categorical: NA## Multiple methods trained together passing extra arguments output <- train(method = c("ANN", "BRT", "RF"), data = data, size = 10, ntree = 300, n.trees = 300, shrinkage = 0.001) output#> $ANN #> Object of class SDMmodel #> Method: ANN #> #> Species: Virtual species #> Presence locations: 400 #> Absence locations: 300 #> #> Model configurations: #> -------------------- #> size: 10 #> decay: 0 #> rang: 0.7 #> maxit: 100 #> #> Variables: #> --------- #> Continuous: bio1 bio12 bio16 bio17 bio5 #> Categorical: NA #> $BRT #> Object of class SDMmodel #> Method: BRT #> #> Species: Virtual species #> Presence locations: 400 #> Absence locations: 300 #> #> Model configurations: #> -------------------- #> distribution: bernoulli #> n.trees: 300 #> interaction.depth: 1 #> shrinkage: 0.001 #> bag.fraction: 0.5 #> #> Variables: #> --------- #> Continuous: bio1 bio12 bio16 bio17 bio5 #> Categorical: NA #> $RF #> Object of class SDMmodel #> Method: RF #> #> Species: Virtual species #> Presence locations: 400 #> Absence locations: 300 #> #> Model configurations: #> -------------------- #> mtry: 2 #> ntree: 300 #> nodesize: 1 #> #> Variables: #> --------- #> Continuous: bio1 bio12 bio16 bio17 bio5 #> Categorical: NA# }