\usepackage{chngcntr}

Many thanks to Céline Bellard for commenting and correcting this tutorial

# Introduction

This complete tutorial introduces all the possibilities of the virtualspecies package. It was written with the objective to be helpful for both beginners and experienced R users. You can read this tutorial in full or jump to the particular section you are looking for. In each section, you will find simple examples introducing each function, followed by detailed examples for almost every possible parametrisation in virtualspecies.

After a small introduction on the spatial data used as input for the virtualspecies package (section 1.), you will be introduced to the basics of generating virtual species distributions: the two possible approaches to create species-environment relationships (sections 2. and 3.), followed by the conversion of environmental suitability to presence-absence (section 4.). After that, you will see how to generate random virtual species (section 5.), and most importantly, how to explore, extract, and use the outputs of virtualspecies (section 6.). Then, you will see how to sample occurrence points (section 7.). If you are interested, you may also learn about the introduction of distribution limitations in section 8.

In the following sections, you will see how to combine virtualspecies with external packages : the species distribution models package biomod (section 9.), and the dispersal limitation package MIGCLIM (section 10. - not yet written).

I will make extensive use of climate data as an example here, but remember that you can use other types of data, as long as it is in the format described in section 1.

# 1. Input data

virtualspecies uses spatialised environmental data as an input. These data must be gridded spatial data in the RasterStack format of the package raster.

### 1.1. Small introduction to raster data

The input data for virtual species is gridded spatial data (i.e., raster data).
To summarise, a raster is a map regularly divided into small units, called pixels. Each pixel has its own value. These values can be, for example, temperature values, for a raster of temperature. Raster data can be imported into R using the package raster. The command stack() will allow you to import your own rasters (stored on your hard drive) into an object of class RasterStack. Below you will find some concrete examples.

### 1.2. Required format for virtualspecies

As stated above, the required format is a RasterStack containing all the environmental variables with which you want to generate a virtual species distribution. Note that I may use interchangeably the variables and layers in this tutorial, because they roughly refer to the same aspect: a layer is the spatial representation of an environmental variable.

The most important part here is that every layer of your RasterStack must be correctly named, because these names will be used when generating virtual species.
Hence, I strongly advise using explicit names for your layers. You can use “variable1”, “variable2”, etc. or “layer1”, “layer2”, etc. but names like “bio1”, “bio2”, “bio3”, (bioclimatic variable names) or “temp1”, “temp2”, etc. will reduce confusion.
You can access the names of the layers of your stack with names(my.stack), and modify them with names(my.stack) <- c("name1", "name2", etc.).

### 1.3. A quick and easy example using WorldClim data

WorldClim (www.worldclim.org) freely provides gridded climate data (temperature and precipitations) for the entire World. These data can be downloaded to your hard drive from the website above, and then imported into R using the stack command: stack("my.path/MyWorldClimStack.bil").

Otherwise, a much simpler solution is to directly download them into R using the function getData() (requires an internet connection).

You have to provide the type of environmental variables you are looking for:

• tmean = average monthly mean temperature (°C * 10)
• tmin = average monthly minimum temperature (°C * 10)
• tmax = average monthly maximum temperature (°C * 10)
• prec = average monthly precipitation (mm)
• bio = bioclimatic variables derived from the tmean, tmin, tmax and prec
• alt = altitude (elevation above sea level) (m) (from SRTM)

And the resolution: 0.5, 2.5, 5, or 10 minutes of a degree. Note that at fine resolutions the files downloaded are very heavy and may take a long time; at the finest resolution (0.5) you cannot download the global file, and you have to provide a longitude and latitude to obtain a tile of the world (see ?getData for help).

Here is the example in practice:

library(raster, quietly = T)
worldclim <- getData("worldclim", var = "bio", res = 10)
worldclim
## class       : RasterStack
## dimensions  : 900, 2160, 1944000, 19  (nrow, ncol, ncell, nlayers)
## resolution  : 0.1666667, 0.1666667  (x, y)
## extent      : -180, 180, -60, 90  (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=longlat +datum=WGS84
## names       : bio1, bio2, bio3, bio4, bio5, bio6, bio7, bio8, bio9, bio10, bio11, bio12, bio13, bio14, bio15, ...

You can see that the object is of class RasterStack, ready to use for virtualspecies. The names of the layers are also convenient.

We can easily access a subset of the layers with [[]]:

# The first four layers
worldclim[[1:4]]
## class       : RasterStack
## dimensions  : 900, 2160, 1944000, 4  (nrow, ncol, ncell, nlayers)
## resolution  : 0.1666667, 0.1666667  (x, y)
## extent      : -180, 180, -60, 90  (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=longlat +datum=WGS84
## names       : bio1, bio2, bio3, bio4
# Layers bio1 and bio12 (annual mean temperature and annual precipitation)
worldclim[[c("bio1", "bio12")]]
## class       : RasterStack
## dimensions  : 900, 2160, 1944000, 2  (nrow, ncol, ncell, nlayers)
## resolution  : 0.1666667, 0.1666667  (x, y)
## extent      : -180, 180, -60, 90  (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=longlat +datum=WGS84
## names       : bio1, bio12

And we can plot our variables:

# Plotting bio1 and bio12
par(oma = c(0.1, 0.1, 0.1, 2.1))
plot(worldclim[[c("bio1", "bio12")]])

# 2. First approach: generate virtual species distributions by defining response functions

The first approach to generate a virtual species consists in defining its response functions to each of the environmental variables contained in our RasterStack. These responses are then combined to calculcate the environmental suitability of the virtual species.

The function providing this approach is generateSpFromFun.

## 2.1. An introduction example

Before we start using the package, let’s prepare our first simulation of virtual species.

We want to generate a virtual species with two environmental variables, the annual mean temperature bio1 and annual mean precipitation bio2. We want to generate a tropical species, i.e., living in hot and humid environments. We can define bell-shaped response functions to these two variables, as in the following figure:

## Loading required package: raster
## Loading required package: sp

Note that bioclimatic temperature variables (here bio1) are is in °C * 10, as explained here.

These bell-shaped functions are in fact gaussian distributions functions of the form:

$f(x, \mu, \sigma) = \frac{1}{\sigma\sqrt{2\pi}}\, e^{-\frac{(x - \mu)^2}{2 \sigma^2}}$

If we take the example of bio1 above, the mean $$\mu$$ is equal to 250 (i.e., 25°C) and standard deviation $$\sigma$$ is equal to 50 (5°C). Hence the response function for bio1 is:

$f(bio1) = \frac{1}{50\sqrt{2\pi}}\, e^{-\frac{(bio1 - 250)^2}{2 \times 50^2}}$

In R, it is very simple to obtain the result of the equation above, with the function dnorm:

# Suitability of the environment for bio1 = 15 °C
dnorm(x = 150, mean = 250, sd = 50)
## [1] 0.001079819

The idea with virtualspecies is to keep the same simplicity when generating a species: we will use the dnorm function to generate our virtual species.

Let’s start with the package now: The first step is to provide to the helper function formatFunctions which responses you want for which variables:

library(virtualspecies)

my.parameters <- formatFunctions(bio1 = c(fun = 'dnorm', mean = 250, sd = 50),
bio12 = c(fun = 'dnorm', mean = 4000, sd = 2000))

After that, the generation of the virtual species is fairly easy:

# Generation of the virtual species
my.first.species <- generateSpFromFun(raster.stack = worldclim[[c("bio1", "bio12")]],
parameters = my.parameters,
plot = TRUE)
my.first.species
## Virtual species generated from 2 variables:
##  bio1, bio12
##
## - Approach used: Responses to each variable
## - Response functions:
##    .bio1  [min=-269; max=314] : dnorm   (mean=250; sd=50)
##    .bio12  [min=0; max=9916] : dnorm   (mean=4000; sd=2000)
## - Each response function was rescaled between 0 and 1
## - Environmental suitability formula = bio1 * bio12
## - Environmental suitability was rescaled between 0 and 1

Congratulations! You have just generated your first virtual species distribution, with the default settings. In the next section you will learn about the simple, yet important settings of this function.

## 2.2. Customisation of parameters

The function generateSpFromFun proceeds into four important steps:

1. The response to each environmental variable is calculated according to the user-chosen functions.
2. Each response is rescaled between 0 and 1. This step can be disabled.
3. The responses are combined together to compute the environmental suitability. The user can choose how to combine the responses.
4. The environmental suitability is rescaled between 0 and 1. This step can be disabled.

### 2.2.1. Response functions

You can use any existing function with generateSpFromFun, as long as you provide the necessary parameters. For example, you can use the function dnorm shown above, if you provide the parameters mean and sd: formatFunctions(bio1 = c(fun = 'dnorm', mean = 250, sd = 50)). Naturally you can change the values of mean and sd to your needs.

You can also use the basic functions provided with the package:

• Linear function: formatFunctions(bio1 = c(fun = 'linearFun', a = 1, b = 0)) $f(bio1) = a * bio1 + b$

• Quadratic function: formatFunctions(bio1 = c(fun = 'quadraticFun', a = -1, b = 2, c = 0)) $f(bio1) = a \times bio1^2 + b \times bio1 + c$

• Logistic function: formatFunctions(bio1 = c(fun = 'logisticFun', beta = 150, alpha = -5)) $f(bio1) = \frac{1}{1 + e^{\frac{bio1 - \beta}{\alpha}}}$
• Normal function defined by extremes: formatFunctions(bio1 = c(fun = 'custnorm', mean = 250, diff = 50, prob = 0.99)) This function allows you to set extremum of a normal curve. In the example above, we define a response where the optimum is 25°C (mean = 250), and 99% of the area under the curve (prob = 0.99) will be comprised between 20 and 30°C (diff = 50).

• Beta response function (Oksanen & Minchin, 2002, Ecological Modelling 157:119-129): formatFunctions(bio1 = c(fun = 'betaFun', p1 = 0, p2 = 35, alpha = 0.9, gamma = 0.08)) $f(bio1) = (bio1 - p1)^{\alpha} * (p2 - bio1)^{\gamma}$

• Or you can create your own functions (see the section How to create your own response functions if you need help for this).

### 2.2.2. Rescaling of individual response functions

This rescaling is performed because if you use very different response function for different variables, (e.g., a Gaussian distribution function with a linear function), then the responses may be disproportionate among variables. By default this rescaling is enabled (rescale.each.response = TRUE), but it can be disabled (rescale.each.response = FALSE).

### 2.2.3. Combining response functions

There are three main possibilities to combine response functions to calculate the environmental suitability, as defined by the parameters species.type and formula:

• species.type = "additive": the response functions are added.
• species.type = "multiplicative": the response functions are multiplied. This is the default behaviour of the function.
• formula = "bio1 + 2 * bio2 + bio3": if you choose a formula, then the response functions are combined according to your formula (parameter species.type is then ignored).

For example, if you want to generate a species with the same partial responses as in 2.1, but with a strong importance for temperature, then you can specify the formula : formula = "2 * bio1 + bio12"

library(virtualspecies)

my.parameters <- formatFunctions(bio1 = c(fun = 'dnorm', mean = 250, sd = 50),
bio12 = c(fun = 'dnorm', mean = 4000, sd = 2000))

# Generation of the virtual species
new.species <- generateSpFromFun(raster.stack = worldclim[[c("bio1", "bio12")]],
parameters = my.parameters,
formula = "2 * bio1 + bio12",
plot = TRUE)
## [1] "2 * bio1 + bio12"
new.species
## Virtual species generated from 2 variables:
##  bio1, bio12
##
## - Approach used: Responses to each variable
## - Response functions:
##    .bio1  [min=-269; max=314] : dnorm   (mean=250; sd=50)
##    .bio12  [min=0; max=9916] : dnorm   (mean=4000; sd=2000)
## - Each response function was rescaled between 0 and 1
## - Environmental suitability formula = 2 * bio1 + bio12
## - Environmental suitability was rescaled between 0 and 1

One can even make complex interactions between partial responses:

library(virtualspecies)

my.parameters <- formatFunctions(bio1 = c(fun = 'dnorm', mean = 250, sd = 50),
bio12 = c(fun = 'dnorm', mean = 4000, sd = 2000))

# Generation of the virtual species
new.species <- generateSpFromFun(raster.stack = worldclim[[c("bio1", "bio12")]],
parameters = my.parameters,
formula = "3.1 * bio1^2 - 1.4 * sqrt(bio12) * bio1",
plot = TRUE)
## [1] "3.1 * bio1^2 - 1.4 * sqrt(bio12) * bio1"