Title: | Uncertainty Analysis for Species Distribution Models |
---|---|
Description: | This is a framework that aims to provide methods and tools for assessing the impact of different sources of uncertainties (e.g.positional uncertainty) on performance of species distribution models (SDMs).) |
Authors: | Babak Naimi |
Maintainer: | Babak Naimi <[email protected]> |
License: | GPL (>= 3) |
Version: | 2.1-7 |
Built: | 2025-02-20 03:28:46 UTC |
Source: | https://github.com/babaknaimi/usdm |
This package provides a number of functions for exploring the impact of different sources of uncertainties (e.g.positional uncertainty) on performance of species distribution models (SDMs).
In addition, there is a function to quantify different local indicators of spatial association (LISA) for raster data.
Babak Naimi [email protected]
https://www.biogeoinformatics.org/
Phisically exclude the collinear variables which are identified using vifcor
or vifstep
from a set of variables.
exclude(x, vif, ...)
exclude(x, vif, ...)
x |
explanatory variables (predictors), defined as a raster object ( |
vif |
an object of class |
... |
additional argument as in |
Before using this function, you should execute one of vifstep
or vifcor
which detect collinearity based on calculating variance inflation factor (VIF) statistics. If vif
is missing, then vifstep
is called.
an object of class same as x
(i.e. RasterStack
or RasterBrick
or data.frame
or matrix
)
Babak Naimi [email protected]
https://www.biogeoinformatics.org/
IF you used this method, please cite the following article for which this package is developed:
Naimi, B., Hamm, N.A.S., Groen, T.A., Skidmore, A.K., and Toxopeus, A.G. 2014. Where is positional uncertainty a problem for species distribution modelling?, Ecography 37 (2): 191-203.
## Not run: file <- system.file("external/spain.tif", package="usdm") r <- rast(file) # reading a RasterBrick object including 10 raster layers in Spain r vif(r) # calculates vif for the variables in r v1 <- vifcor(r, th=0.9) # identify collinear variables that should be excluded v1 re1 <- exclude(r,v1) # exclude the collinear variables that were identified in # the previous step re1 v2 <- vifstep(r, th=10) # identify collinear variables that should be excluded v2 re2 <- exclude(r, v2) # exclude the collinear variables that were identified in # the previous step re2 re3 <- exclude(r) # first, vifstep is called re3 ## End(Not run)
## Not run: file <- system.file("external/spain.tif", package="usdm") r <- rast(file) # reading a RasterBrick object including 10 raster layers in Spain r vif(r) # calculates vif for the variables in r v1 <- vifcor(r, th=0.9) # identify collinear variables that should be excluded v1 re1 <- exclude(r,v1) # exclude the collinear variables that were identified in # the previous step re1 v2 <- vifstep(r, th=10) # identify collinear variables that should be excluded v2 re2 <- exclude(r, v2) # exclude the collinear variables that were identified in # the previous step re2 re3 <- exclude(r) # first, vifstep is called re3 ## End(Not run)
Calculate different statistics of local indicator of spatial association (LISA) for each cell in a raster data.
lisa(x, y, d1=0, d2, cell, statistic="I")
lisa(x, y, d1=0, d2, cell, statistic="I")
x |
a raster object ( |
y |
a |
d1 |
numeric. A number (distance), specifies local neighborhood size. Default is 0, means that the local neighborhood starts from the cell (distance = 0) and ends to a distance = d2 |
d2 |
numeric. A number (distance), specifies local neighborhood size. It specifies the distance to which should be considered as a local neighborhood around a cell |
cell |
numeric (optional). A cell number or a vector of cell numbers in the Raster object, at which LISA should be calculated |
statistic |
a character string specifying the LISA statistic that should be calculated. This can be one of "I", "c", "G", "G*", and "K1" |
This function can calculate different LISA statistics at each grid cell in Raster object. The statistics, implemented in this function, include local Moran's I ("I"), local Geary's c ("c"), local G and G* ("G" and "G*"), and local K1 statistics. This function returns standardized value (Z) for Moran, G and G*, and K1 statistics. If a SpatialPoints
or a vector of numbers is defined for y
or cell
, the LISA is calculated only for the specified locations by points or cells.
Note: A set of similar functions have been implemented in the elsa
package by the author of this package, and since the computation part of elsa is written in C programming language, the function in elsa is much faster.
RasterLayer |
if |
RasterBrick |
if |
numeric vector |
if |
Babak Naimi [email protected]
https://www.biogeoinformatics.org/
Anselin, L. 1995. Local indicators of spatial association, Geographical Analysis, 27, 93–115;
Getis, A. and Ord, J. K. 1996 Local spatial statistics: an overview. In P. Longley and M. Batty (eds) Spatial analysis: modelling in a GIS environment (Cambridge: Geoinformation International), 261–277.
Naimi, B., Hamm, N.A.S., Groen, T.A., Skidmore, A.K., and Toxopeus, A.G. 2014. Where is positional uncertainty a problem for species distribution modelling?, Ecography 37 (2): 191-203.
## Not run: file <- system.file("external/spain.tif", package="usdm") r <- rast(file) # reading a RasterBrick object including 10 rasters in Spain r plot(r) # visualize the raster layers plot(r[[1]]) # visualize the first raster layer r.I <- lisa(x=r[[1]],d1=0,d2=25000,statistic="I") # local Moran's I plot(r.I) # entering r instead of r[[1]], givees the indicator for each layer: r.I <- lisa(x=r,d1=0,d2=25000,statistic="I") plot(r.I) r.c <- lisa(x=r[[1]],d1=0,d2=25000,statistic="c") # local Geary's c plot(r.c) r.g <- lisa(x=r[[1]],d1=0,d2=25000,statistic="G") # G statistic plot(r.g) r.g2 <- lisa(x=r[[1]],d1=0,d2=25000,statistic="G*") # G* statistic plot(r.g2) r.K1 <- lisa(x=r[[1]],d1=0,d2=30000,statistic="K1") # gives K1 statistic for each layer plot(r.K1) lisa(x=r,d1=0,d2=30000,cell=2000,statistic="I") # gives local Moran's I at cell number 2000 #for each raster layer in r lisa(x=r,d1=0,d2=30000,cell=c(2000,2002,2003),statistic="c") # calculates local Moran's I # at cell numbers of 2000,2002, and 2003 for each raster layer in r sp <- sampleRandom(r[[1]],20,sp=TRUE) # draw 20 random points from r, # and returns a SpatialPointsDataFrame plot(r[[1]]) points(sp) lisa(x=r,y=sp,d1=0,d2=30000,statistic="I") # calculates the local Moran's I at # point locations in sp for each raster layer in r ## End(Not run)
## Not run: file <- system.file("external/spain.tif", package="usdm") r <- rast(file) # reading a RasterBrick object including 10 rasters in Spain r plot(r) # visualize the raster layers plot(r[[1]]) # visualize the first raster layer r.I <- lisa(x=r[[1]],d1=0,d2=25000,statistic="I") # local Moran's I plot(r.I) # entering r instead of r[[1]], givees the indicator for each layer: r.I <- lisa(x=r,d1=0,d2=25000,statistic="I") plot(r.I) r.c <- lisa(x=r[[1]],d1=0,d2=25000,statistic="c") # local Geary's c plot(r.c) r.g <- lisa(x=r[[1]],d1=0,d2=25000,statistic="G") # G statistic plot(r.g) r.g2 <- lisa(x=r[[1]],d1=0,d2=25000,statistic="G*") # G* statistic plot(r.g2) r.K1 <- lisa(x=r[[1]],d1=0,d2=30000,statistic="K1") # gives K1 statistic for each layer plot(r.K1) lisa(x=r,d1=0,d2=30000,cell=2000,statistic="I") # gives local Moran's I at cell number 2000 #for each raster layer in r lisa(x=r,d1=0,d2=30000,cell=c(2000,2002,2003),statistic="c") # calculates local Moran's I # at cell numbers of 2000,2002, and 2003 for each raster layer in r sp <- sampleRandom(r[[1]],20,sp=TRUE) # draw 20 random points from r, # and returns a SpatialPointsDataFrame plot(r[[1]]) points(sp) lisa(x=r,y=sp,d1=0,d2=30000,statistic="I") # calculates the local Moran's I at # point locations in sp for each raster layer in r ## End(Not run)
Plot the variogram computed for raster data by Variogram
function
## S4 method for signature 'RasterVariogram' plot(x, ...)
## S4 method for signature 'RasterVariogram' plot(x, ...)
x |
an object of class |
... |
additional argument (see details) |
This function plot the empirical variogram, or variogram cloud if cloud
set to TRUE
or a boxplot of variogram cloud data if box
set to TRUE
,
Below are additional arguments:
cloud
logical. If TRUE
, the function plots variogram cloud.
box
logical. If TRUE
, the function plots boxplot of variogram cloud.
...
xlab
, ylab
and main
and other arguments are same as the base plot
function.
plots the variogram.
Babak Naimi [email protected]
https://r-gis.net/ https://www.biogeoinformatics.org/
file <- system.file("external/spain.tif", package="usdm") r <- rast(file) # reading a RasterBrick including 5 rasters (predictor variables) r plot(r[[1]]) # visualize the raster layers v1 <- Variogram(r[[1]]) # compute variogram for the first raster plot(v1) plot(v1,cloud=TRUE) plot(v1,box=TRUE)
file <- system.file("external/spain.tif", package="usdm") r <- rast(file) # reading a RasterBrick including 5 rasters (predictor variables) r plot(r[[1]]) # visualize the raster layers v1 <- Variogram(r[[1]]) # compute variogram for the first raster plot(v1) plot(v1,cloud=TRUE) plot(v1,box=TRUE)
Plot the values of LISAs at species occurrence locations, which can be used to identify the locations that need positional uncertainty treatment.
## S4 method for signature 'speciesLISA,missing' plot(x, y, ...) ## S4 method for signature 'speciesLISA,SpatialPolygons' plot(x, y, ...) ## S4 method for signature 'speciesLISA,SpatialPolygonsDataFrame' plot(x, y, ...)
## S4 method for signature 'speciesLISA,missing' plot(x, y, ...) ## S4 method for signature 'speciesLISA,SpatialPolygons' plot(x, y, ...) ## S4 method for signature 'speciesLISA,SpatialPolygonsDataFrame' plot(x, y, ...)
x |
an object of class |
y |
optional. Boundary map of the study area, an object of class |
... |
additional argument (see details) |
This function generates a map (i.e. a bubble plot) in which the species points present the magnitude of LISA in predictors at the location as open or filled circles with different sizes.
Below are additional arguments:
cex
the maximum symbol size (circle) in the plot.
levels
specifies the number of LISA levels at which the points are presented .
xyLegend
a vector with two numbers, specifying the coordinates of the legend. If missing, the function tries to find the appropriate location for it.
...
xlab
, ylab
and main
same as the base plot
function.
plots the bubble plot.
Babak Naimi [email protected]
https://www.biogeoinformatics.org/
file <- system.file("external/predictors.tif", package="usdm") r <- rast(file) # reading a RasterBrick object including 4 rasters in the Netherlands r plot(r) # visualize the raster layers sp.file <- system.file("external/species_nl.shp", package="usdm") sp <- vect(sp.file) splisa <- speciesLisa(x=r,y=sp,uncertainty=15000,weights=c(0.22,0.2,0.38,0.2)) splisa plot(splisa) bnd.file <- system.file("external/boundary.shp", package="usdm") bnd <- vect(bnd.file) # reading the boundary map plot(splisa,bnd) #plot(splisa,bnd,levels=c(2,4,6,8)) #plot(splisa,bnd,levels=c(-5,-3,0,3,5))
file <- system.file("external/predictors.tif", package="usdm") r <- rast(file) # reading a RasterBrick object including 4 rasters in the Netherlands r plot(r) # visualize the raster layers sp.file <- system.file("external/species_nl.shp", package="usdm") sp <- vect(sp.file) splisa <- speciesLisa(x=r,y=sp,uncertainty=15000,weights=c(0.22,0.2,0.38,0.2)) splisa plot(splisa) bnd.file <- system.file("external/boundary.shp", package="usdm") bnd <- vect(bnd.file) # reading the boundary map plot(splisa,bnd) #plot(splisa,bnd,levels=c(2,4,6,8)) #plot(splisa,bnd,levels=c(-5,-3,0,3,5))
An object of the RasterVariogram
class contains information about the empirical variogram of a raster data. The object can be created with the function: Variogram
.
Slots for speciesLISA object:
lag
:a number specifying lag distance
nlags
:a number specifying number of lags based on cutoff parameter
variogramCloud
:matrix
, including semivariance for all pairs
variogram
:data.frame
, including binned semivariance within each lag
Babak Naimi [email protected]
https://www.biogeoinformatics.org/
showClass("speciesLISA")
showClass("speciesLISA")
Given a level of positional uncertainty (defined as a distance), this function calculates different statistics of local indicator of spatial association (LISA) in predictors (explanatory variables, defined as a raster object) at each species occurrence location (defined as a SpatialPoints object). According to Naimi et al. 2012, this can be used to understand whether positional uncertainty at which species locations are likely to affect predictive performance of species distribution models.
speciesLisa(x, y, uncertainty, statistic="K1",weights)
speciesLisa(x, y, uncertainty, statistic="K1",weights)
x |
explanatory variables (predictors), defined as a raster object ( |
y |
species occurrence points, defined as a |
uncertainty |
level of positional uncertainty, defined as a number (distance) |
statistic |
a character string specifying the LISA statistic that should be calculated. This can be one of "I", "c", "G", "G*", and "K1". Default is "K1" |
weights |
a numeric vector specifying the relative importance of explanatory variables in species distribution models (the first value in the |
This function calculates a LISA statistic for each explanatory variable at each species point. Although several statistics including local Moran's I ("I"), local Geary's c ("c"), local G and G* ("G" and "G*"), and local K1 statistics, can be calculated, according to Naimi et al. (2012), "K1" statistic (default) is recommended. This function returns a speciesLISA
object, which includes species occurrence data, LISA statistic for each predictor at species locations, and an aggregated LISA statistic (a single LISA) at each species location, given the variable impotances. If weights in not specified, the equal weights (i.e. equal importance for explanatory variables) will be considered.
speciesLISA
Babak Naimi [email protected]
https://www.biogeoinformatics.org/
IF you used this method, please cite the following article for which this package is developed:
Naimi, B., Hamm, N.A.S., Groen, T.A., Skidmore, A.K., and Toxopeus, A.G. 2014. Where is positional uncertainty a problem for species distribution modelling?, Ecography 37 (2): 191-203.
## Not run: file <- system.file("external/predictors.tif", package="usdm") r <- rast(file) # reading a RasterBrick object including 4 rasters in the Netherlands r plot(r) # visualize the raster layers sp.file <- system.file("external/species_nl.shp", package="usdm") sp <- vect(sp.file) splisa <- speciesLisa(x=r,y=sp,uncertainty=15000,weights=c(0.22,0.2,0.38,0.2)) splisa plot(splisa) bnd.file <- system.file("external/boundary.shp", package="usdm") bnd <- vect(bnd.file) # reading the boundary map plot(splisa,bnd) ## End(Not run)
## Not run: file <- system.file("external/predictors.tif", package="usdm") r <- rast(file) # reading a RasterBrick object including 4 rasters in the Netherlands r plot(r) # visualize the raster layers sp.file <- system.file("external/species_nl.shp", package="usdm") sp <- vect(sp.file) splisa <- speciesLisa(x=r,y=sp,uncertainty=15000,weights=c(0.22,0.2,0.38,0.2)) splisa plot(splisa) bnd.file <- system.file("external/boundary.shp", package="usdm") bnd <- vect(bnd.file) # reading the boundary map plot(splisa,bnd) ## End(Not run)
An object of the speciesLISA
class contains information about a local indicator of spatial association (LISA) statistic in predictor variables at the location of species occurrences. The object can be created with the function: speciesLisa
.
Slots for speciesLISA object:
species
:object of class SpatialPoints
data
:data.frame
, attribute table of species points
LISAs
:matrix
, LISA statistics for different predictors
weights
:numeric
, the variable importance
statistic
:character
, the name of LISA statistic
LISA
:numeric
, aggregated LISAs at each species location
Babak Naimi [email protected]
https://www.biogeoinformatics.org/
showClass("speciesLISA")
showClass("speciesLISA")
Compute sample (empirical) variogram from raster data. The function returns a binned variogram and a variogram cloud.
Variogram(x, lag, cutoff, cells, size=100)
Variogram(x, lag, cutoff, cells, size=100)
x |
a raster object ( |
lag |
the lag size (width of subsequent distance intervals) into which cell pairs are grouped for semivariance estimates. If missing, the cell size (raster resolution) is assigned. |
cutoff |
spatial separation distance up to which cell pairs are included in semivariance estimates; as a default, the length of the diagonal of the box spanning the data is divided by three. |
cells |
numeric (optional). A vector of cell numbers in the Raster object. This forces the function to only consider these cells (and their neighbours) to compute the variogram. |
size |
positive integer specifying the number of cells to be drawn from raster object. If the number of cells in the raster object is large, a sample with the specified size is drawn to make the computation more efficient. |
Variograms are widely used for exploring spatial structure in a single variable. Formally, it is defined as half the expected squared difference (half the variance of the difference) in the variable value at a specific geographical separation. A variogram summarizes the spatial relations in the data, and can be used to understand within what range (distance) the data is spatially autocorrelated. Naimi et al. (2011) linked this range to the impact of positional uncertainty on the performance of species distribution models (SDMs). Based on that study, examining variogram to find the effective autocorrelation range in predictors gives insight into whether predictions by SDMs are likely to be affected by the uncertainty in the sample locations (see Naimi et al. 2011, for more information).
Note: A similar function has been implemented in the elsa
package by the author of this package, and since the computation part of elsa is written in C programming language, the function in elsa is much faster.
RasterVariogram |
Babak Naimi [email protected]
https://www.biogeoinformatics.org/
Naimi, B., Skidmore, A.K, Groen, T.A., Hamm, N.A.S. 2011. Spatial autocorrelation in predictors reduces the impact of positional uncertainty in occurrence data on species distribution modelling, Journal of biogeography. 38: 1497-1509.
Naimi, B., Hamm, N.A.S., Groen, T.A., Skidmore, A.K., and Toxopeus, A.G. 2014. Where is positional uncertainty a problem for species distribution modelling?, Ecography 37 (2): 191-203.
## Not run: file <- system.file("external/spain.tif", package="usdm") r <- rast(file) # reading a RasterBrick object including 10 raster layers in Spain r plot(r[[1]]) # plot the first RasterLayer in r v1 <- Variogram(r[[1]]) # compute the sample variogram for the first layer in r v2 <- Variogram(r[[1]],lag=25000,cutoff=100000) # specify the lag and cutoff parameters ## End(Not run)
## Not run: file <- system.file("external/spain.tif", package="usdm") r <- rast(file) # reading a RasterBrick object including 10 raster layers in Spain r plot(r[[1]]) # plot the first RasterLayer in r v1 <- Variogram(r[[1]]) # compute the sample variogram for the first layer in r v2 <- Variogram(r[[1]],lag=25000,cutoff=100000) # specify the lag and cutoff parameters ## End(Not run)
Calculates variance inflation factor (VIF) for a set of variables and exclude the highly correlated variables from the set through a stepwise procedure. This method can be used to deal with multicollinearity problems when you fit statistical models
vif(x, size, ...) vifcor(x, th = 0.9, keep = NULL, size, method = 'pearson', ...) vifstep(x, th = 10, keep = NULL, size, method = 'pearson', ...)
vif(x, size, ...) vifcor(x, th = 0.9, keep = NULL, size, method = 'pearson', ...) vifstep(x, th = 10, keep = NULL, size, method = 'pearson', ...)
x |
Numeric explanatory variables (predictors), defined as a raster object ( |
th |
a numeric value specifying the correlation threshold for vifcor, and VIF threshold for vifstep (see details). |
keep |
A character vector with the name of variables that should not be excluded even if they are collinear, e.g., because of ecological reasons |
size |
When the data is big, a random sample of the records (cells from raster or rows from data.frame) with the specified size is selected; default is 5000. |
method |
a chatacter (one of c("pearson","spearman","kendall")) specifies the method to calculate a pairwise correlation; deafult="pearson". |
... |
not implemented. |
VIF can be used to detect collinearity (Strong correlation between two or more predictor variables). Collinearity causes instability in parameter estimation in regression-type models. The VIF is based on the square of the multiple correlation coefficient resulting from regressing a predictor variable against all other predictor variables. If a variable has a strong linear relationship with at least one other variables, the correlation coefficient would be close to 1, and VIF for that variable would be large. A VIF greater than 10 is a signal that the model has a collinearity problem. vif
function calculates this statistic for all variables in x
. vifcor
and vifstep
uses two different strategy to exclude highly collinear variable through a stepwise procedure.
- vifcor
, first finds a pair of variables which has the maximum linear correlation (greater than the threshold; th), and exclude the one with a greater VIF. The procedure is repeated untill no pair of variables with a high corrrelation coefficient (grater than the threshold) remains.
- vifstep
calculates VIF for all variables, excludes the one with the highest VIF (if it is greater than the threshold), repeat the procedure untill no variables with a VIF greater than th
remains.
addtional arguments:
method
default is "pearson", specifies the correlation method (one'pearson','kendall','spearman')
size
a number (default=5000) specifying the maximum number of observations should be contributed in calculation of VIF. When the number of observations (cells in raster or rows in data.frame/matrix) is greater than size
, then a random sample with a size of size
is drawn to keep the calculation effecient.
keep
: sometimes we may have strong biological/ecological justification to keep some variables in the model even if the statistical calculations suggest otherwise. In that case, the keep
argument can help to introduce the name of such variables (or the number specifying which columns in data.frame or which layers in raster object should be kept) to the functions, then the stepwise procedure take them into account to find which variables should be excluded.
an object of class VIF
Babak Naimi [email protected]
https://www.biogeoinformatics.org/
Chatterjee, S. and Hadi, A. S. 2006. Regression analysis by example. John Wiley and Sons.;
Dormann, C. F. et al. 2012. Collinearity: A review of methods to Deal with it and a simulation study evaluating their performance. Ecography 35: 001-020.;
————–
IF you used this method, please cite the following article for which this package is developed:
Naimi, B., Hamm, N.A.S., Groen, T.A., Skidmore, A.K., and Toxopeus, A.G. 2014. Where is positional uncertainty a problem for species distribution modelling?, Ecography 37 (2): 191-203.
## Not run: file <- system.file("external/spain.tif", package="usdm") r <- rast(file) # reading a SpatRaster object including 10 raster layers in Spain r vif(r) # calculates vif for the variables in r v1 <- vifcor(r, th=0.9) # identify collinear variables that should be excluded v1 v2 <- vifstep(r, th=10) # identify collinear variables that should be excluded v2 v3 <- vifstep(r, th=10, keep = c('Bio4','Bio10')) v3 ## End(Not run)
## Not run: file <- system.file("external/spain.tif", package="usdm") r <- rast(file) # reading a SpatRaster object including 10 raster layers in Spain r vif(r) # calculates vif for the variables in r v1 <- vifcor(r, th=0.9) # identify collinear variables that should be excluded v1 v2 <- vifstep(r, th=10) # identify collinear variables that should be excluded v2 v3 <- vifstep(r, th=10, keep = c('Bio4','Bio10')) v3 ## End(Not run)
An object of the VIF
class contains information about collinearity in relavant variables. The object can be created with the following functions: vifcor
and vifstep
.
Slots for VIF object
variables
:Character
excluded
:character
corMatrix
:a correlation matrix
results
:data.frame
including VIF values for the remained (not excluded) variables
Babak Naimi [email protected]
https://www.biogeoinformatics.org/
showClass("VIF")
showClass("VIF")