Modeling Spatially Biased Citizen Science Effort Through the eBird Database

Abstract

Citizen science databases are increasing in importance as sources of ecological information, but variability in location and effort is inherent to such data. Spatially biased data—data not sampled uniformly across the study region—is expected. A further introduction of bias is the variable level of sampling activity, such as duration, across locations. This motivates our work- with a spatial dataset of visited locations and activity levels, we propose a formal, model-based approach for assessing effort at these locations. Adjusting for potential spatial bias both in terms of sites visited and in terms of effort is crucial for creating reliable species distribution models (SDMs). Using data from eBird, a global citizen science database dedicated to avifauna, and illustrative regions in Pennsylvania and Germany, we model spatial dependence in both the observation locations and observed activity. We apply point process models to explain the observed locations in space, fit a geostatistical model to explain observation effort at locations, and explore the potential existence of preferential sampling, i.e., dependence between the two processes. Altogether, we offer a more holistic notion of sampling effort, combining information about location and activity.

Publication
Environmental and Ecological Statistics. In publication
Avatar
Becky Tang
Assistant Professor of Statistics