This is the repost of an article we wrote for the Ecography blog
In 1990, Stuart H. Hurlbert analysed the “Spatial Distribution of the Montane Unicorn”. The Montane Unicorn was a rare organism, at that time only recently described, and Hurlbert was the first to report data on this singular species. His data showed that unicorn populations had extremely unusual and varying abundance distributions. He therefore analysed these abundance distributions with the most widely recognised method back then, namely the variance:mean ratio. It was admitted that when this variance:mean ratio was equal to 1, then the abundance distribution followed a Poisson distribution. Most surprisingly, Stuart H. Hurlbert showed that none of his unicorn populations followed a Poisson distribution, but all had a variance:mean ratio equal to 1, proving that the variance:mean ratio was actually useless as a measure of population aggregation.
Stuart H. Hurlbert had the brilliant idea to simulate a simple dataset to invalidate a long standing belief in statistical ecology. Ecology is a science built upon field-sampled data, from which ecologists make assumptions and test them using statistical methods. However, not all statistical methods are fully understood, or correctly applied by ecologists. As a consequence, models do not always model what we think they model, or their results do not always mean what we think they mean. In such cases, simulated data can help validating or invalidating assumptions about models.
This very general simulation approach would probably be called the Virtual Ecologist approach in modern ecology (Zurell et al. 2010). Several fields of ecology (biogeography, climate change ecology, invasion biology, conservation biology) are currently heavily using models to predict species distribution ranges. These models, namely species distributions models (SDMs) (also termed habitat suitability models or ecological niche models) statistically relate species occurrence data with environmental variables in order to predict species potential distribution ranges. Because of the thriving of SDMs in ecological literature, a plethora of tools, methods and protocols have been developed. Knowing which approaches model species distribution best is a challenge that many ecologists have attempted to tackle using sampled species data. However, sampled species data suffer many confounding factors, such as incompleteness, spatial bias, identification errors, inadequate detection, all of which preclude generalisation of validation exercises. As a consequence, ecologists decided to start modelling unicorns in the last decade, and started simulating virtual species in order to validate their assumptions about SDMs, test their performances, and the effects of different sampling biases on them.
Consequently, virtual species are currently becoming a common tool in the SDM literature. However, modelling unicorns for SDM testing is no easy task, because it requires adequate programming skills, and no complete and user-friendly software package was available up until recently. Most importantly, if not thought carefully, simulated unicorns may also lead to wrong conclusions. Meynard and Kaplan (2013) showed that an inadequate simulation of virtual species could lead to strong overestimation of SDM accuracy, and Moudrý (2015), dug deeper into the shortcomings related to using inappropriate simulation strategies. Consequently, we decided to help the ecological community to simulate adequate unicorns, by proposing a complete and user-friendly R package, namely “virtualspecies”. virtualspecies combines the existing methodological approaches in a complete framework, with the objective of generating virtual species distributions with increased ecological realism.
The package is described in our recent article Leroy B., Meynard, C.N., Bellard C. & Courchamp F. 2015. virtualspecies: an R package to generate virtual species distributions. Ecography, 38:001-009. It is freely available, and a complete tutorial is also available at http://borisleroy.com/en/virtualspecies/.