A Total Error Approach for Validating Event Data,American Behavioral Scientist

当前位置： X-MOL 学术 › American Behavioral Scientist › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A Total Error Approach for Validating Event Data
American Behavioral Scientist ( IF 2.3 ) Pub Date : 2021-06-14 , DOI: 10.1177/00027642211021635
Scott Althaus ₁ , Buddy Peyton ₁ , Dan Shalmon ₁

Affiliation

Understanding how useful any particular set of event data might be for conflict research requires appropriate methods for assessing validity when ground truth data about the population of interest do not exist. We argue that a total error framework can provide better leverage on these critical questions than previous methods have been able to deliver. We first define a total event data error approach for identifying 19 types of error that can affect the validity of event data. We then address the challenge of applying a total error framework when authoritative ground truth about the actual distribution of relevant events is lacking. We argue that carefully constructed gold standard datasets can effectively benchmark validity problems even in the absence of ground truth data about event populations. To illustrate the limitations of conventional strategies for validating event data, we present a case study of Boko Haram activity in Nigeria over a 3-month offensive in 2015 that compares events generated by six prominent event extraction pipelines—ACLED, SCAD, ICEWS, GDELT, PETRARCH, and the Cline Center’s SPEED project. We conclude that conventional ways of assessing validity in event data using only published datasets offer little insight into potential sources of error or bias. Finally, we illustrate the benefits of validating event data using a total error approach by showing how the gold standard approach used to validate SPEED data offers a clear and robust method for detecting and evaluating the severity of temporal errors in event data.

中文翻译：

验证事件数据的总错误方法

了解任何特定的事件数据集对冲突研究的有用程度需要适当的方法来评估有关感兴趣人群的真实数据不存在时的有效性。我们认为，与以前的方法相比，总错误框架可以更好地利用这些关键问题。我们首先定义了一个总事件数据错误方法，用于识别可能影响事件数据有效性的 19 种错误类型。然后，当缺乏有关相关事件实际分布的权威基本事实时，我们解决了应用总错误框架的挑战。我们认为，即使在没有关于事件种群的真实数据的情况下，精心构建的黄金标准数据集也可以有效地对有效性问题进行基准测试。为了说明验证事件数据的传统策略的局限性，我们展示了尼日利亚博科圣地活动在 2015 年为期 3 个月的攻势中的案例研究，该研究比较了六个主要事件提取管道——ACLED、SCAD、ICEWS、GDELT、 PETRARCH 和 Cline 中心的 SPEED 项目。我们得出的结论是，仅使用已发布数据集评估事件数据有效性的传统方法几乎无法洞察潜在的错误或偏差来源。最后，我们通过展示用于验证 SPEED 数据的黄金标准方法如何提供一种清晰而稳健的方法来检测和评估事件数据中时间错误的严重性，从而说明使用总错误方法验证事件数据的好处。我们展示了尼日利亚博科圣地活动在 2015 年为期 3 个月的攻势中的案例研究，比较了六个著名事件提取管道（ACLED、SCAD、ICEWS、GDELT、PETRARCH 和 Cline 中心的 SPEED 项目）生成的事件。我们得出的结论是，仅使用已发布数据集评估事件数据有效性的传统方法几乎无法洞察潜在的错误或偏差来源。最后，我们通过展示用于验证 SPEED 数据的黄金标准方法如何为检测和评估事件数据中时间错误的严重性提供清晰而稳健的方法，来说明使用总错误方法验证事件数据的好处。我们展示了尼日利亚博科圣地活动在 2015 年为期 3 个月的攻势中的案例研究，比较了六个著名事件提取管道（ACLED、SCAD、ICEWS、GDELT、PETRARCH 和 Cline 中心的 SPEED 项目）生成的事件。我们得出的结论是，仅使用已发布数据集评估事件数据有效性的传统方法几乎无法洞察潜在的错误或偏差来源。最后，我们通过展示用于验证 SPEED 数据的黄金标准方法如何为检测和评估事件数据中时间错误的严重性提供清晰而稳健的方法，来说明使用总错误方法验证事件数据的好处。我们得出的结论是，仅使用已发布数据集评估事件数据有效性的传统方法几乎无法洞察潜在的错误或偏差来源。最后，我们通过展示用于验证 SPEED 数据的黄金标准方法如何为检测和评估事件数据中时间错误的严重性提供清晰而稳健的方法，来说明使用总错误方法验证事件数据的好处。我们得出的结论是，仅使用已发布数据集评估事件数据有效性的传统方法几乎无法洞察潜在的错误或偏差来源。最后，我们通过展示用于验证 SPEED 数据的黄金标准方法如何提供一种清晰而稳健的方法来检测和评估事件数据中时间错误的严重性，从而说明使用总错误方法验证事件数据的好处。

更新日期：2021-06-15

点击分享查看原文

点击收藏

阅读更多本刊最新论文