Skip to contents

Split a dataset randomly in training and testing datasets or training, validation and testing datasets.

Usage

trainValTest(x, test, val = 0, only_presence = FALSE, seed = NULL)

Arguments

x

SWD object containing the data that have to be split in training, validation and testing datasets.

test

numeric. The percentage of data withhold for testing.

val

numeric. The percentage of data withhold for validation, default is 0.

only_presence

logical. If TRUE the split is done only for the presence locations and all the background locations are included in each partition, used manly for presence-only methods, default is FALSE.

seed

numeric. The value used to set the seed in order to have consistent results, default is NULL.

Value

A list with the training, validation and testing or training and testing SWD objects accordingly.

Details

When only_presence = FALSE, the proportion of presence and absence is preserved.

Author

Sergio Vignali

Examples

# Acquire environmental variables
files <- list.files(path = file.path(system.file(package = "dismo"), "ex"),
                    pattern = "grd",
                    full.names = TRUE)

predictors <- terra::rast(files)

# Prepare presence and background locations
p_coords <- virtualSp$presence
bg_coords <- virtualSp$background

# Create SWD object
data <- prepareSWD(species = "Virtual species",
                   p = p_coords,
                   a = bg_coords,
                   env = predictors,
                   categorical = "biome")
#>  Extracting predictor information for presence locations
#>  Extracting predictor information for presence locations [40ms]
#> 
#>  Extracting predictor information for absence/background locations
#>  Extracting predictor information for absence/background locations [63ms]
#> 

# Split presence locations in training (80%) and testing (20%) datasets
# and splitting only the presence locations
datasets <- trainValTest(data,
                         test = 0.2,
                         only_presence = TRUE)
train <- datasets[[1]]
test <- datasets[[2]]

# Split presence locations in training (60%), validation (20%) and testing
# (20%) datasets and splitting the presence and the absence locations
datasets <- trainValTest(data,
                         val = 0.2,
                         test = 0.2)
train <- datasets[[1]]
val <- datasets[[2]]
test <- datasets[[3]]