Thin Data — thinData • SDMtune

Remove all but one location per raster cell. The function removes NAs and if more than one location falls within the same raster cell it selects randomly one.

Usage

thinData(coords, env, x = "x", y = "y", verbose = TRUE, progress = TRUE)

Arguments

coords: data.frame or matrix with the coordinates, see details.
env: rast containing the environmental variables.
x: character. Name of the column containing the x coordinates.
y: character. Name of the column containing the y coordinates.
verbose: logical, if TRUE prints an informative message.
progress: logical, if TRUE shows a progress bar.

Value

a matrix or a data frame with the thinned locations.

Details

coords and env must have the same coordinate reference system.
The coords argument can contain several columns. This is useful if the user has information related to the coordinates that doesn't want to loose with the thinning procedure. The function expects to have the x coordinates in a column named "x", and the y coordinates in a column named "y". If this is not the case, the name of the columns containing the coordinates can be specified using the arguments x and y.

Author

Sergio Vignali

Examples

# Acquire environmental variables
files <- list.files(path = file.path(system.file(package = "dismo"), "ex"),
                    pattern = "grd",
                    full.names = TRUE)

predictors <- terra::rast(files)

# Prepare background locations, by sampling  also on areas with NA values
bg_coords <- terra::spatSample(predictors,
                               size = 9000,
                               method = "random",
                               xy = TRUE,
                               values = FALSE)
nrow(bg_coords)
#> [1] 9000

# Thin the locations
# The function will remove the coordinates that have NA values for some
# predictors. Note that the function expects to have the coordinates in two
# columns named "x" and "y"

colnames(bg_coords)
#> [1] "x" "y"
thinned_bg <- thinData(bg_coords,
                       env = predictors)
#> ✔ Removed -6475 NAs and no duplicated locations
nrow(thinned_bg)
#> [1] 2525

# Here we sample only on areas without NA values and then we double the
# coordinates
bg_coords <- terra::spatSample(predictors,
                               size = 9000,
                               method = "random",
                               na.rm = TRUE,
                               xy = TRUE,
                               values = FALSE)

thinned_bg <- thinData(rbind(bg_coords, bg_coords),
                       env = predictors)
#> ✔ Removed no NAs and 9000 duplicated locations

nrow(thinned_bg)
#> [1] 9000

# In case of a dataframe containing more than two columns (e.g. a dataframe
# with the coordinates plus an additional column with the age of the species)
# and custom column names, use the function in this way
age <- sample(c(1, 2),
              size = nrow(bg_coords),
              replace = TRUE)

data <- cbind(age, bg_coords)
colnames(data) <- c("age", "X", "Y")

thinned_bg <- thinData(data,
                       env = predictors,
                       x = "X",
                       y = "Y")
#> ✔ Removed no NAs and no duplicated locations
head(data)
#>      age       X      Y
#> [1,]   1 -100.25  23.75
#> [2,]   1  -90.75  29.75
#> [3,]   2  -63.75  10.25
#> [4,]   2  -39.75  -5.75
#> [5,]   1 -110.75  37.25
#> [6,]   1  -70.75 -39.75