当前位置: X-MOL 学术J. Atmos. Sol. Terr. Phys. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
RMSE is not enough: Guidelines to robust data-model comparisons for magnetospheric physics
Journal of Atmospheric and Solar-Terrestrial Physics ( IF 1.8 ) Pub Date : 2021-04-01 , DOI: 10.1016/j.jastp.2021.105624
Michael W. Liemohn , Alexander D. Shane , Abigail R. Azari , Alicia K. Petersen , Brian M. Swiger , Agnit Mukhopadhyay

The magnetospheric physics research community uses a broad array of quantitative data-model comparison methods (metrics) when conducting their research investigations. It is often the case, though, that any particular study will only use one or two metrics, with the two most common being Pearson correlation coefficient and root mean square error (RMSE). Because metrics are designed to test a specific aspect of the data-model relationship, limiting the comparison to only one or two metrics reduces the physical insights that can be gleaned from the analysis, restricting the possible findings from modeling studies. Additional physical insights can be obtained when many types of metrics are applied. We organize metrics into two primary groups: 1) fit performance metrics, often based on the data-model value difference; and 2) event detection metrics, which use a discrete event classification of data and model values determined by a specified threshold. In addition to these groups, there are several major categories of metrics based on the aspect of the data-model relationship that the metric assesses: 1) accuracy; 2) bias; 3) precision; 4) association; 5) and extremes. Another category is skill, which is a measure of any of these metrics against the performance of a reference model. These can be applied to a subset of either the data or the model values, known as reliability and discrimination assessments. In the context of magnetospheric physics examples, we discuss best practices for choosing metrics for particular studies.



中文翻译:

RMSE还远远不够:有关磁层物理学的可靠数据模型比较的准则

磁层物理学研究界在进行研究时会使用各种各样的定量数据模型比较方法(指标)。但是,通常情况下,任何特定研究都只会使用一个或两个指标,最常见的两个是皮尔逊相关系数和均方根误差(RMSE)。因为度量标准旨在测试数据模型关系的特定方面,所以将比较限制在一个或两个度量标准之内将减少可以从分析中收集的物理见解,从而限制了建模研究的可能发现。当应用多种类型的指标时,可以获得其他物理见解。我们将指标分为两个主要组:1)适合性能指标,通常基于数据模型的价值差异;和2)事件检测指标,它们使用由指定阈值确定的数据和模型值的离散事件分类。除了这些组之外,还有一些主要类别的指标,这些指标基于指标评估的数据模型关系的方面:1)准确性;2)偏见;3)精度;4)协会;5)和极端。另一类是技能,它是相对于参考模型的性能来衡量这些指标中任何一个的指标。这些可以应用于数据或模型值的子集,称为可靠性和辨别力评估。在磁层物理学实例的背景下,我们讨论了为特定研究选择度量的最佳实践。基于度量评估的数据模型关系的方面,度量有几个主要类别:1)准确性;2)偏见;3)精度;4)协会;5)和极端。另一类是技能,它是相对于参考模型的性能来衡量这些指标中任何一个的指标。这些可以应用于数据或模型值的子集,称为可靠性和辨别力评估。在磁层物理学实例的背景下,我们讨论了为特定研究选择度量的最佳实践。基于度量评估的数据模型关系的方面,度量有几个主要类别:1)准确性;2)偏见;3)精度;4)协会;5)和极端。另一类是技能,它是相对于参考模型的性能来衡量这些指标中任何一个的指标。这些可以应用于数据或模型值的子集,称为可靠性和辨别力评估。在磁层物理学实例的背景下,我们讨论了为特定研究选择度量的最佳实践。这些可以应用于数据或模型值的子集,称为可靠性和辨别力评估。在磁层物理学实例的背景下,我们讨论了为特定研究选择度量的最佳实践。这些可以应用于数据或模型值的子集,称为可靠性和辨别力评估。在磁层物理学实例的背景下,我们讨论了为特定研究选择度量的最佳实践。

更新日期:2021-04-08
down
wechat
bug