Train a model • SDMtune

Intro

In the previous article you have learned how to prepare the data for the analysis using the virtualSp dataset and the WorldClim environmental variables. Now it’s time to train your first model, let’s do it!

SDMtune supports four methods for model training:

Artificial Neural Networks ANN, using the nnet package (Venables and Ripley 2002);
Boosted Regression Trees BRT, using the gbm package (Greenwell et al. 2019);
Maximum Entropy with two implementations:
- Maxent using the dismo package (Hijmans et al. 2017);
- Maxnet using the maxnet package (Phillips 2017);
Random Forest RF, using the randomForest package (Liaw and Wiener 2002).

The code necessary to train a model is the same for all the implementations. We will show how to train a Maxent model, you can adapt the code for the other methods or check this article.

Train a model with default settings

First we load the SDMtune package:

library(SDMtune)
#> 
#>    _____  ____   __  ___ __
#>   / ___/ / __ \ /  |/  // /_ __  __ ____   ___
#>   \__ \ / / / // /|_/ // __// / / // __ \ / _ \
#>  ___/ // /_/ // /  / // /_ / /_/ // / / //  __/
#> /____//_____//_/  /_/ \__/ \__,_//_/ /_/ \___/  version 1.3.2
#> 
#> To cite this package in publications type: citation("SDMtune").

We use the function train() to train a Maxent model. We need to provide two arguments:

method: “Maxent” in our case;
data: the SWD() object with the presence and background locations that we created in the previous article.

default_model <- train(method = "Maxent",
                       data = data)

The function trains the model using default settings that are:

linear, quadratic, product and hinge feature class combinations;
regularization multiplier equal to 1;
500 algorithm iterations.

We will see later how to change the default settings, for the moment let’s have a look at the default_model object.

Explore an SDMmodel object

The output of the function train() is an object of class SDMmodel(). Let’s print it:

default_model
#> 
#> ── Object of class: <SDMmodel> ──
#> 
#> Method: Maxent
#> 
#> ── Hyperparameters
#> • fc: "lqph"
#> • reg: 1
#> • iter: 500
#> 
#> ── Info
#> • Species: Virtual species
#> • Presence locations: 400
#> • Absence locations: 5000
#> 
#> ── Variables
#> • Continuous: "bio1", "bio12", "bio16", "bio17", "bio5", "bio6", "bio7", and
#> "bio8"
#> • Categorical: "biome"

When we print an SDMmodel object we get the following information:

the name of the class;
the method used to train the model;
the name of the species;
the number of presence locations;
the number of absence/background locations;
the model configurations:
- fc: the feature class combinations;
- reg: the regularization multiplier;
- iter: the number of iterations;
the environmental variables used to train the model:
- the name of the continuous environmental variables, if any;
- the name of the categorical environmental variables, if any.

An SDMmodel() object has two slots:

slotNames(default_model)
#> [1] "data"  "model"

data: an SWD() object with the presence absence/background locations used to train the model;
model: a Maxent() object, in our case, with all the model configurations.

The slot model contains the configurations of the model plus other information used to make predictions.

slotNames(default_model@model)
#>  [1] "results"    "reg"        "fc"         "iter"       "extra_args"
#>  [6] "lambdas"    "coeff"      "formula"    "lpn"        "dn"        
#> [11] "entropy"    "min_max"

For the moment the most important are: fc, reg and iter that contain the values of the model configuration. We will explore the others later in another article.

Train a model changing the default settings

The function train() accepts optional arguments that can be used to change the default model settings. In our previous example we could have trained the same model using:

default_model <- train(method = "Maxent", 
                       data = data, 
                       fc = "lqph", 
                       reg = 1, 
                       iter = 500)

Try yourself

Try to change the default settings and train a model using linear and hinge as feature class combination, 0.5 as regularization multiplier and 700 iterations. To see the solution highlight the next cell:

model <- train(method = "Maxent", 
               data = data, 
               fc = "lh", 
               reg = 0.5, 
               iter = 700)

By default Maxent models are trained using the arguments “removeduplicates=false” and “addsamplestobackground=false”. The user should have the full control of the data used to train the model, so is expected that duplicated locations are already removed and that the presence locations are already included in the background locations, when desired. You can use the function thinData() to remove duplicated locations and the function addSamplesToBg() to add the presence locations to the background locations.

Train a Maxnet model

Train a model using the Maxnet method is as simple as changing the name of the method in the train() function, the only difference here is that we cannot set the number of iteration.

Try yourself

Try to train a model using the Maxnet method. To see the solution highlight the following cell:

maxnet_model <- train("Maxnet", 
                      data = data)

Conclusion

In this article you have learned:

how to train a Maxent model using default settings;
how to explore an SDMmodel() object;
how to train a model changing the default settings;
how to train a model using the Maxnet method.

In the next article you will learn how to use the model that you have just trained to get the predicted value for new localities.

References

Greenwell, Brandon, Bradley Boehmke, Jay Cunningham, and GBM Developers. 2019. “gbm: Generalized Boosted Regression Models. R package version 2.1.5.” https://CRAN.R-project.org/package=gbm.

Hijmans, Robert J., Steven Phillips, John Leathwick, and Jane Elith. 2017. “dismo: Species Distribution Modeling. R package version 1.1-4.” https://cran.r-project.org/package=dismo.

Liaw, Andy, and Matthew Wiener. 2002. “Classification and Regression by randomForest.” R News 2 (3): 18–22.

Phillips, Steven. 2017. maxnet: Fitting ’Maxent’ Species Distribution Models with ’glmnet’. R package version 0.1.2. https://cran.r-project.org/package=maxnet.

Venables, W N, and B. D. Ripley. 2002. Modern Applied Statistics with S. Fourth Edi. New York, NY: Springer.