当前位置: X-MOL 学术J. R. Stat. Soc. Ser. C Appl. Stat. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Correcting misclassification errors in crowdsourced ecological data: A Bayesian perspective
The Journal of the Royal Statistical Society: Series C (Applied Statistics) ( IF 1.0 ) Pub Date : 2020-11-11 , DOI: 10.1111/rssc.12453
Edgar Santos‐Fernandez 1, 2 , Erin E. Peterson 1, 2 , Julie Vercelloni 1, 2 , Em Rushworth 1, 2 , Kerrie Mengersen 1, 2
Affiliation  

Many research domains use data elicited from ‘citizen scientists’ when a direct measure of a process is expensive or infeasible. However, participants may report incorrect estimates or classifications due to their lack of skill. We demonstrate how Bayesian hierarchical models can be used to learn about latent variables of interest, while accounting for the participants’ abilities. The model is described in the context of an ecological application that involves crowdsourced classifications of georeferenced coral‐reef images from the Great Barrier Reef, Australia. The latent variable of interest is the proportion of coral cover, which is a common indicator of coral reef health. The participants’ abilities are expressed in terms of sensitivity and specificity of a correctly classified set of points on the images. The model also incorporates a spatial component, which allows prediction of the latent variable in locations that have not been surveyed. We show that the model outperforms traditional weighted‐regression approaches used to account for uncertainty in citizen science data. Our approach produces more accurate regression coefficients and provides a better characterisation of the latent process of interest. This new method is implemented in the probabilistic programming language Stan and can be applied to a wide number of problems that rely on uncertain citizen science data.

中文翻译:

纠正众包生态数据中的错误分类错误:贝叶斯观点

当直接测量过程昂贵或不可行时,许多研究领域都使用从“公民科学家”获得的数据。但是,由于缺乏技能,参与者可能会报告不正确的估计或分类。我们演示了如何使用贝叶斯层次模型来了解感兴趣的潜在变量,同时考虑到参与者的能力。该模型是在生态应用的背景下进行描述的,该应用涉及来自澳大利亚大堡礁的地理参考珊瑚礁图像的众包分类。潜在的潜在变量是珊瑚覆盖率,这是珊瑚礁健康的常见指标。参与者的能力以图像上正确分类的一组点的敏感性和特异性表示。该模型还包含一个空间分量,该分量允许在尚未调查的位置预测潜在变量。我们证明了该模型优于传统的加权回归方法,该方法用于解决公民科学数据中的不确定性。我们的方法产生更准确的回归系数,并提供了对潜在潜伏过程的更好表征。这种新方法是用概率编程语言Stan实现的,可以应用于依赖不确定的公民科学数据的各种问题。我们的方法产生更准确的回归系数,并提供了对潜在潜伏过程的更好表征。这种新方法是用概率编程语言Stan实现的,可以应用于依赖不确定的公民科学数据的各种问题。我们的方法产生更准确的回归系数,并提供了对潜在潜伏过程的更好表征。这种新方法是用概率编程语言Stan实现的,可以应用于依赖不确定的公民科学数据的各种问题。
更新日期:2021-01-20
down
wechat
bug