Anomaly Detection in Seismic Data–Metadata Using Simple Machine‐Learning Models,Seismological Research Letters

当前位置： X-MOL 学术 › Seismol. Res. Lett. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Anomaly Detection in Seismic Data–Metadata Using Simple Machine‐Learning Models
Seismological Research Letters ( IF 3.3 ) Pub Date : 2021-07-01 , DOI: 10.1785/0220200339
Riccardo Zaccarelli ₁ , Dino Bindi ₁ , Angelo Strollo ₁

Affiliation

In modern seismological analysis, it is not unusual to process huge amounts of data, as illustrated by two case studies exemplified in this work, both assessing the quality of several millions of segments selected for computing local and energy magnitudes. In this scenario, quality control tools to filter, discard, or rank data are of extreme importance and should ideally be simple, fast, and generalizable. Using machine‐learning tools, we present here a simple and efficient model based on the isolation forest algorithm for detecting amplitude anomalies on any seismic waveform segment, with no restriction on the segment record content (earthquake vs. noise) and no additional requirements than the segment metadata. By considering a simple feature space composed of amplitudes of each segment’s power spectral density (PSD) evaluated at selected periods suitable for both local and teleseismic applications, feature selection revealed that one single feature, the PSD at 5 s, is sufficient to achieve the best predicting performances. The evaluation results report average precision scores around 0.97, and maximum F1 scores above 0.9, both remarkable results with respect to the simplicity of the approach used and the generality of the problem tackled. The trained model producing the best evaluation results is the backbone of a publicly available software, which computes an amplitude anomaly score in [0, 1] for any given seismic waveform, and can be beneficial in several applications such as discarding anomalies from data sets, ideally in a preprocessing stage, and detecting potential metadata problems on data center side. When applied to our two case studies, the software was revealed to be fast and effective, and the computed anomaly scores allow additional flexibility in addition to the proven wide‐range applicability.

中文翻译：

使用简单的机器学习模型在地震数据 - 元数据中进行异常检测

在现代地震分析中，处理大量数据并不罕见，如本工作中举例说明的两个案例研究所示，这两个案例研究都评估了为计算局部和能量大小而选择的数百万个片段的质量。在这种情况下，用于过滤、丢弃或排序数据的质量控制工具非常重要，理想情况下应该简单、快速且可推广。使用机器学习工具，我们在这里提出了一个基于隔离森林算法的简单有效的模型，用于检测任何地震波形段的振幅异常，对段记录内容（地震与噪声）没有限制，除了段元数据。通过考虑由每个段的功率谱密度 (PSD) 的振幅组成的简单特征空间，该空间在适用于局部和远震应用的选定周期内评估，特征选择表明一个单一特征，即 5 s 时的 PSD，足以实现最佳预测表现。评估结果报告平均精度分数约为 0.97，最大 F1 分数高于 0.9，这在所用方法的简单性和所解决问题的普遍性方面取得了显着的结果。产生最佳评估结果的训练模型是公开可用软件的支柱，它计算任何给定地震波形的 [0, 1] 中的振幅异常分数，并且可以在多种应用中受益，例如从数据集中丢弃异常，理想情况下，在预处理阶段，并检测数据中心侧潜在的元数据问题。当应用于我们的两个案例研究时，该软件被证明是快速有效的，并且除了经过验证的广泛适用性之外，计算出的异常分数还提供了额外的灵活性。

更新日期：2021-06-28

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>