Full presentation
Transcription
Full presentation
Overview • SDMs: what and why? SDMs: what and why? • Types of models • Challenges – Messy data – Difficult questions Difficult questions Species distribution models GIS data Predictor variables Statistical model Predicted distribution Predicted distribution Species data Response variable Response variable - Also known as (sometimes with different meanings): • - Ecological Niche Model (ENM) - Habitat model -Geographic Information System Climate envelope model - Occupancy model » Raster - Resource Selection Function » Pixel » Grain G i - Correlative C l ti model d l Austin MP (1999) Ecography, 22, 465-484. Species distribution models GIS data Predictor variables Statistical model Predicted distribution Predicted distribution Species data Response variable Response variable • Geographic Information System » Raster » Pixel » Grain G i Austin MP (1999) Ecography, 22, 465-484. Example: marine fishes, NZ • Catch data from 17000 marine research trawls for 120 demersal species • For predicting to maps (conservation F di ti t ( ti planning: marine reserves) planning: marine reserves) Black oreo dory Allocyttus niger Response: catch/no R t h/ catch t h or kg catch Predictors (covariates) Typical fitted functions… Black oreo dory Allocyttus niger -4 0 500 --2 0 2 contrib. - 10.3 % fitted functio on 2 0 --2 0 2 4 6 1500 0 5 10 15 contrib. - 9.3 % contrib. - 5.5 % contrib. - 3.1 % -0.4 0.0 SalResid 0.4 0.00 0.10 SSTGrad 0 -2 2 0 -2 2 0 -0.8 2 SusPartMatter fitted fu unction AvgDepth fitted fu unction TempResid -2 fitted fu unction contrib. - 29.9 % fitted functio on 2 0 --2 fitted functio on Boe - 31.6 % 0 1 2 3 4 ChlaCase2 5 Typical predictions… Predicting with SDMs Geographic ((“BIOTOPE”) BIOTOPE ) space Predictions back in geographic space Environmental ((“NICHE”) NICHE ) space Explain • Used for explanation, understanding, hypotheses.. Image by N. Zimmermann Predicting with SDMs • Many species (8.7 million?, most still Many species (8.7 million?, most still undescribed Mora et al. PlosOne 2011) • Threats: clearing, invasives, climate change • Maps help: Leathwick et al. 2008 Cons Letters Predicting to new times and places Invasive species Climate change 2000 2070 Predicting to new times and places Invasive species Climate change P (Y= =1) 2000 2070 Temperature T Types off models d l Species p data Modelling methods GIS data Predictor variables Statistical model Predicted distribution Predicted distribution Species data Response variable Response variable ONLY use presence data Envelope: BIOCLIM Distance‐based: Distance based: DOMAIN, Euclidean distance.. DOMAIN Euclidean distance temperature e ‐ ‐ rainfall Modelling methods Presence‐only data Presence only data + “background” + background ‐ e.g. GARP, MaxEnt Modelling methods Regression‐type methods Machine‐learning methods GLMs – Generalized Linear Models MaxEnt – Maximum Entropy GAMs – Generalized Additive Models GARP ‐ Genetic Algorithms for Rule‐based Predictions SVMs ‐ Support Vector Machines CART – Classification Cl ifi i and Regression Trees dR i T RFs ‐ Random Forests BRTs ‐ Boosted Regression Trees MARS – Multivariate Adaptive Regression p g Splines ANNs ‐ Artificial Neural Network models R: raster, dismo.. Ch ll Challenge #1 #1: messy d data Atlas of Living Australia (ala.org.au) ala.org.au, GBIF etc • Data aggregators Data aggregators • Now: make data available • Future: updates / corrections / clean data sets – feedback, inform all users ALA birds 1980-1990 2000 2013 2000-2013 Thanks to Alejandra Morán-Ordóñez for these images Example biases… • Globally: Gl b ll wealth, lth language, l location, l ti security it • Geography GBIF: Spatial variation in number of records per km • Proximity to roads, rivers, towns • Community groups, favourite spots • In uncleared areas; within political boundaries • Preferences: rare and unusual sightings g g • Survey “design” - stratified sampling • Detection biases hidden in here too.. 2 Bias • The problem: – Model assumes presence data are random sample – Presence data are biased Presence data are biased – Background data are by default unbiased Effect of bias Virtual species in Ontario Canada Virtual species in Ontario, Canada – prefers mid‐range of all climatic variables PA model from biased presence/absence data Presence‐absence model recovers species distribution PO‐bg model from biased occurrence data Model recovers sampling bias, not species distribution Dealing with bias • “target target group background group background” – Use sites for a group of species to select background – Effect of sampling bias cancels out • make predictors (rasters) representing bias – Use to weight background points or as predictors Challenge #2: Difficult applications Extrapolating Invasive species Climate change 2000 2070 The cane toad - native ti range Americas A i - introduced to Aus in 1935 to ‘control’ cane beetles - now a major pest (poisonous to most native fauna, no predators predators, over 200 million in Australia) The cane toad – mechanistic model Kearney, Phillips et al 2008 Ecography 31: 423-434 Predictions from mechanistic model The cane toad as a model organism Kearney and Porter 2008 Ecology Letters 12 Mechanistic model predicts occurrence well • strong inference on physiological limitations f h l ll • “gold gold standard standard” (?) (?) Previous SDMs for the cane toad Available data Data • 8 climatic li ti variables i bl – rainfall, temperature and humidity – now; future f t (4.6ºC (4 6ºC warmer)) • 1932 presences (weighted…) • 575 absences or 10000 background samples Models • GLM (generalized linear model; xx, x2) • GAM – smooths up to 4df • BRT (boosted regression trees) • MaxEnt Predicting potential in current climates Mechanistic Background Weights GAM GLM BRT MaxEnt Correlation with mechanistic model: 0.80 0.78 0.64 0.79 Predicting potential: current GAM Mechanistic GLM Background sample 1. invasives: choice of “absence” matters 2. know when you’re extrapolating Observed absence Future: 4.6° warmer GAM current future GLM Mechanistic BRT M E t MaxEnt Controlling the fitted functions.. Mechanistic BRT 2900 trees 150 trees (same learning rate & tree depth) MaxEnt only hinge features; standard t d d regularization 3. how you fit model matters only hinge features; strong regularization GLMs GAMs how to extrapolate? Microsoft www.eBird.org