Train a model using one of the following methods: Artificial Neural Networks, Boosted Regression Trees, Maxent, Maxnet or Random Forest.

train(method, data, folds = NULL, verbose = TRUE, ...)

Arguments

method

character or character vector. Method used to train the model, possible values are "ANN", "BRT", "Maxent", "Maxnet" or "RF", see details.

data

SWD object with presence and absence/background locations.

folds

list. Output of the function randomFolds or folds object created with other packages, see details, default is NULL.

verbose

logical, if TRUE shows a progress bar during cross validation, default is TRUE.

...

Arguments passed to the relative method, see details.

Value

An SDMmodel or SDMmodelCV or a list of model objects.

Details

  • For the ANN method possible arguments are (for more details see nnet):

    • size: integer. Number of the units in the hidden layer.

    • decay numeric. Weight decay, default is 0.

    • rang numeric. Initial random weights, default is 0.7.

    • maxit integer. Maximum number of iterations, default is 100.

  • For the BRT method possible arguments are (for more details see gbm):

    • distribution: character. Name of the distribution to use, default is "bernoulli".

    • n.trees: integer. Maximum number of tree to grow, default is 100.

    • interaction.depth: integer. Maximum depth of each tree, default is 1.

    • shrinkage: numeric. The shrinkage parameter, default is 0.1.

    • bag.fraction: numeric. Random fraction of data used in the tree expansion, default is 0.5.

  • For the RF method the model is trained as classification. Possible arguments are (for more details see randomForest):

    • mtry: integer. Number of variable randomly sampled at each split, default is floor(sqrt(number of variables)).

    • ntree: integer. Number of tree to grow, default is 500.

    • nodesize: integer. Minimum size of terminal nodes, default is 1.

  • Maxent models are trained using the arguments "removeduplicates=false" and "addsamplestobackground=false". Use the function thinData to remove duplicates and the function addSamplesToBg to add presence locations to background locations. For the Maxent method, possible arguments are:

    • reg: numeric. The value of the regularization multiplier, default is 1.

    • fc: character. The value of the feature classes, possible values are combinations of "l", "q", "p", "h" and "t", default is "lqph".

    • iter: numeric. Number of iterations used by the MaxEnt algorithm, default is 500.

  • Maxnet models are trained using the argument "addsamplestobackground = FALSE", use the function addSamplesToBg to add presence locations to background locations. For the Maxnet method, possible arguments are (for more details see maxnet):

    • reg: numeric. The value of the regularization intensity, default is 1.

    • fc: character. The value of the feature classes, possible values are combinations of "l", "q", "p", "h" and "t", default is "lqph".

The folds argument accepts also objects created with other packages: ENMeval or blockCV. In this case the function converts internally the folds into a format valid for SDMtune.

When multiple methods are given as method argument, the function returns a named list of model object, with the name corresponding to the used method, see examples.

References

Venables, W. N. & Ripley, B. D. (2002) Modern Applied Statistics with S. Fourth Edition. Springer, New York. ISBN 0-387-95457-0.

Brandon Greenwell, Bradley Boehmke, Jay Cunningham and GBM Developers (2019). gbm: Generalized Boosted Regression Models. https://CRAN.R-project.org/package=gbm.

A. Liaw and M. Wiener (2002). Classification and Regression by randomForest. R News 2(3), 18--22.

Hijmans, Robert J., Steven Phillips, John Leathwick, and Jane Elith. 2017. dismo: Species Distribution Modeling. https://cran.r-project.org/package=dismo.

Steven Phillips (2017). maxnet: Fitting 'Maxent' Species Distribution Models with 'glmnet'. https://CRAN.R-project.org/package=maxnet.

Muscarella, R., Galante, P.J., Soley-Guardia, M., Boria, R.A., Kass, J., Uriarte, M. and R.P. Anderson (2014). ENMeval: An R package for conducting spatially independent evaluations and estimating optimal model complexity for ecological niche models. Methods in Ecology and Evolution.

Roozbeh Valavi, Jane Elith, José Lahoz-Monfort and Gurutzeta Guillera-Arroita (2018). blockCV: Spatial and environmental blocking for k-fold cross-validation. https://github.com/rvalavi/blockCV.

See also

Author

Sergio Vignali

Examples

# \donttest{ # Acquire environmental variables files <- list.files(path = file.path(system.file(package = "dismo"), "ex"), pattern = "grd", full.names = TRUE) predictors <- raster::stack(files) # Prepare presence and background locations p_coords <- virtualSp$presence bg_coords <- virtualSp$background # Create SWD object data <- prepareSWD(species = "Virtual species", p = p_coords, a = bg_coords, env = predictors, categorical = "biome")
#> Extracting predictor information for presence locations...
#> Extracting predictor information for absence/background locations...
## Train a Maxent model # The next line checks if Maxent is correctly configured but you don't need # to run it in your script if (dismo::maxent(silent = TRUE)) { model <- train(method = "Maxent", data = data, fc = "l", reg = 1.5, iter = 700) # Add samples to background. This should be done preparing the data before # training the model without using data <- addSamplesToBg(data) model <- train("Maxent", data = data) }
#> This is MaxEnt version 3.4.1 #> This is MaxEnt version 3.4.1
## Train a Maxnet model model <- train(method = "Maxnet", data = data, fc = "lq", reg = 1.5) ## Cross Validation # Create 4 random folds splitting only the presence data folds <- randomFolds(data, k = 4, only_presence = TRUE) model <- train(method = "Maxnet", data = data, fc = "l", reg = 0.8, folds = folds) if (FALSE) { # Run only if you have the package ENMeval installed ## Block partition using the ENMeval package require(ENMeval) block_folds <- get.block(occ = data@coords[data@pa == 1, ], bg.coords = data@coords[data@pa == 0, ]) model <- train(method = "Maxnet", data = data, fc = "l", reg = 0.8, folds = block_folds) ## Checkerboard1 partition using the ENMeval package cb_folds <- get.checkerboard1(occ = data@coords[data@pa == 1, ], env = predictors, bg.coords = data@coords[data@pa == 0, ], aggregation.factor = 4) model <- train(method = "Maxnet", data = data, fc = "l", reg = 0.8, folds = cb_folds) ## Environmental block using the blockCV package # Run only if you have the package blockCV require(blockCV) # Create spatial points data frame library(raster) sp_df <- SpatialPointsDataFrame(data@coords, data = as.data.frame(data@pa), proj4string = crs(predictors)) e_folds <- envBlock(rasterLayer = predictors, speciesData = sp_df, species = "data@pa", k = 4, standardization = "standard", rasterBlock = FALSE) model <- train(method = "Maxnet", data = data, fc = "l", reg = 0.8, folds = e_folds) } ## Train presence absence models # Prepare presence and absence locations p_coords <- virtualSp$presence a_coords <- virtualSp$absence # Create SWD object data <- prepareSWD(species = "Virtual species", p = p_coords, a = a_coords, env = predictors[[1:5]])
#> Extracting predictor information for presence locations...
#> Extracting predictor information for absence/background locations...
## Train an Artificial Neural Network model model <- train("ANN", data = data, size = 10) ## Train a Random Forest model model <- train("RF", data = data, ntree = 300) ## Train a Boosted Regression Tree model model <- train("BRT", data = data, n.trees = 300, shrinkage = 0.001) ## Multiple methods trained together with default arguments output <- train(method = c("ANN", "BRT", "RF"), data = data, size = 10) output$ANN
#> Object of class SDMmodel #> Method: ANN #> #> Species: Virtual species #> Presence locations: 400 #> Absence locations: 300 #> #> Model configurations: #> -------------------- #> size: 10 #> decay: 0 #> rang: 0.7 #> maxit: 100 #> #> Variables: #> --------- #> Continuous: bio1 bio12 bio16 bio17 bio5 #> Categorical: NA
output$BRT
#> Object of class SDMmodel #> Method: BRT #> #> Species: Virtual species #> Presence locations: 400 #> Absence locations: 300 #> #> Model configurations: #> -------------------- #> distribution: bernoulli #> n.trees: 100 #> interaction.depth: 1 #> shrinkage: 0.1 #> bag.fraction: 0.5 #> #> Variables: #> --------- #> Continuous: bio1 bio12 bio16 bio17 bio5 #> Categorical: NA
output$RF
#> Object of class SDMmodel #> Method: RF #> #> Species: Virtual species #> Presence locations: 400 #> Absence locations: 300 #> #> Model configurations: #> -------------------- #> mtry: 2 #> ntree: 500 #> nodesize: 1 #> #> Variables: #> --------- #> Continuous: bio1 bio12 bio16 bio17 bio5 #> Categorical: NA
## Multiple methods trained together passing extra arguments output <- train(method = c("ANN", "BRT", "RF"), data = data, size = 10, ntree = 300, n.trees = 300, shrinkage = 0.001) output
#> $ANN #> Object of class SDMmodel #> Method: ANN #> #> Species: Virtual species #> Presence locations: 400 #> Absence locations: 300 #> #> Model configurations: #> -------------------- #> size: 10 #> decay: 0 #> rang: 0.7 #> maxit: 100 #> #> Variables: #> --------- #> Continuous: bio1 bio12 bio16 bio17 bio5 #> Categorical: NA #> $BRT #> Object of class SDMmodel #> Method: BRT #> #> Species: Virtual species #> Presence locations: 400 #> Absence locations: 300 #> #> Model configurations: #> -------------------- #> distribution: bernoulli #> n.trees: 300 #> interaction.depth: 1 #> shrinkage: 0.001 #> bag.fraction: 0.5 #> #> Variables: #> --------- #> Continuous: bio1 bio12 bio16 bio17 bio5 #> Categorical: NA #> $RF #> Object of class SDMmodel #> Method: RF #> #> Species: Virtual species #> Presence locations: 400 #> Absence locations: 300 #> #> Model configurations: #> -------------------- #> mtry: 2 #> ntree: 300 #> nodesize: 1 #> #> Variables: #> --------- #> Continuous: bio1 bio12 bio16 bio17 bio5 #> Categorical: NA
# }