当前位置: X-MOL 学术Int. J. Biometeorol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Outlier detection methods to improve the quality of citizen science data
International Journal of Biometeorology ( IF 3.2 ) Pub Date : 2020-07-15 , DOI: 10.1007/s00484-020-01968-z
Jennifer S Li 1 , Andreas Hamann 1 , Elisabeth Beaubien 1
Affiliation  

Citizen science involves public participation in research, usually through volunteer observation and reporting. Data collected by citizen scientists are a valuable resource in many fields of research that require long-term observations at large geographic scales. However, such data may be perceived as less accurate than those collected by trained professionals. Here, we analyze the quality of data from a plant phenology network, which tracks biological response to climate change. We apply five algorithms designed to detect outlier observations or inconsistent observers. These methods rely on different quantitative approaches, including residuals of linear models, correlations among observers, deviations from multivariate clusters, and percentile-based outlier removal. We evaluated these methods by comparing the resulting cleaned datasets in terms of time series means, spatial data coverage, and spatial autocorrelations after outlier removal. Spatial autocorrelations were used to determine the efficacy of outlier removal, as they are expected to increase if outliers and inconsistent observations are successfully removed. All data cleaning methods resulted in better Moran’s I autocorrelation statistics, with percentile-based outlier removal and the clustering method showing the greatest improvement. Methods based on residual analysis of linear models had the strongest impact on the final bloom time mean estimates, but were among the weakest based on autocorrelation analysis. Removing entire sets of observations from potentially unreliable observers proved least effective. In conclusion, percentile-based outlier removal emerges as a simple and effective method to improve reliability of citizen science phenology observations.

中文翻译:

提高公民科学数据质量的异常值检测方法

公民科学涉及公众参与研究,通常是通过志愿者观察和报告。公民科学家收集的数据是许多需要在大地理范围内进行长期观察的研究领域的宝贵资源。然而,这些数据可能会被认为不如训练有素的专业人员收集的数据准确。在这里,我们分析来自植物物候网络的数据质量,该网络跟踪生物对气候变化的反应。我们应用了五种算法来检测异常观察或不一致的观察。这些方法依赖于不同的定量方法,包括线性模型的残差、观察者之间的相关性、多元聚类的偏差以及基于百分位的异常值去除。我们通过在时间序列均值、空间数据覆盖率和离群值去除后的空间自相关方面比较所得的清理数据集来评估这些方法。空间自相关用于确定异常值去除的有效性,因为如果成功去除异常值和不一致的观测值,它们预计会增加。所有数据清理方法都产生了更好的 Moran's I 自相关统计,基于百分位的异常值去除和聚类方法显示出最大的改进。基于线性模型残差分析的方法对最终绽放时间平均估计的影响最大,但基于自相关分析的方法最弱。事实证明,从可能不可靠的观察者身上移除整套观察结果是最不有效的。综上所述,
更新日期:2020-07-15
down
wechat
bug