Exploring the unsupervised classification of seismic events of Cotopaxi volcano
Introduction
Hazards associated with volcanic eruptions such as lava flows, pyroclastic flows, debris flows, landslides, tephra, and ash are potential life-threatening for the population of cities close to volcanoes. Examples of such cities are Mexico City near Popocatepetl volcano, Tokyo near Mt. Fuji, Quito near Cotopaxi and Guagua Pichincha volcanoes, among others (Siebert et al., 2011; Phillipson et al., 2013; Pérez et al., 2020a, Pérez et al., 2020b). Although volcanic eruptions are often unpredictable, volcanic observatories use monitoring techniques to estimate the probability of unrest and/or an eruption (Schmincke, 2004). Several kinds of data can be used, including geomagnetic and electromagnetic (Currenti et al., 2005), geochemical (Green et al., 2013), deformation (Segall, 2013), infrasonic (Marchetti et al., 2019), thermal from satellite (Marchese et al., 2006) and from the ground (Carniel et al., 2010) in order to support early warnings of eruptions, but ideally a combination of them (e.g., Surono et al., 2012). However, for sure seismic data at the heart of any monitoring system should include both the analysis of continuous volcanic tremor (Carniel, 2014), due to its potential in terms of persistence and memory (Jaquet and Carniel, 2003; Jaquet et al., 2006) and the time evolution of discrete volcano-seismic events such as volcano-tectonic (VT) earthquakes, long-period (LP) events, tremors (TRE), and explosions (EXP). VT are earthquakes taking place in a volcanic environment associated with pressure changes induced by magma movement (McNutt, 2005; Malfante et al., 2018). They have a variable time duration that is usually less than 30 s and a broad spectral content that is typically above 5 Hz (Lara-Cueva et al., 2016b). LP events are low-frequency events, with a typical time duration below 90 s, with main spectral content limited at narrow frequency bands between 2 and 5 Hz (Lara-Cueva et al., 2016b). They are related to resonating fluid-filled conduits or cavities induced by pressure transients in the fluid (McNutt, 2005; Malfante et al., 2018). EXP quakes are generated by sudden magma, ash, and gas extrusion. Meanwhile, TRE events are produced by long-duration release of energy-related to magma degassing. They are denoted by their constant amplitude and long duration, ranging from a few minutes to several days (McNutt, 2005; Malfante et al., 2018; Lara-Cueva et al., 2016b). However, most of the observatories rely on manual classification and counting of seismic events which could lead to delays and errors due to human subjectivity. Moreover, the number of seismic events becomes often unbearable to manually classify exactly when it is most needed, i.e., during the unrest.
In this sense, machine learning classifiers with supervised or unsupervised learning have been employed during the last decade to different application contexts. Successfully supervised learning approaches used to tackle the problem of seismic events classification include artificial neural networks (ANN) (Parihar et al., 2018; Lara-Cueva et al., 2016a; Titos et al., 2018b; Esposito et al., 2006; Del Pezzo et al., 2003), random forest (Rodgers et al., 2016), hidden Markov models (Benitez et al., 2007; Alasonati et al., 2006), Gaussian mixture models (Venegas et al., 2019), support vector machines (Parihar et al., 2018; Curilem et al., 2014; Malfante et al., 2018; Apolloni et al., 2009), maximum-likelihood, k-nearest neighbor methods (Parihar et al., 2018), and lately, deep learning models (Titos et al., 2018a). The main problem with supervised classification is the availability of a reliable labeled dataset to be used for training and the fact that resulting classifications are difficult to generalize to be applicable to more than one volcano (Cortés et al., 2019). On the other hand, unsupervised learning methods, which have been applied to different problems as well (Maset et al., 2015), intend to form structured groups or clusters in datasets without prior knowledge of any class labels (Zheng et al., 2017). Some studies reported in the literature include: principal component analysis (PCA) (Unglert et al., 2016), mixtures of Gaussian (Hammer et al., 2012), hidden Markov models (Bebbington, 2007), cluster analysis (Langer et al., 2009) and self-organizing map (SOM) (Kuyuk et al., 2011; Langer et al., 2009).
Approaches focusing on volcanoes and their seismic activities have been less explored, but SOM models seem to be the most popular. In Köhler et al. (2010), a SOM model focused on volcanic wavefield patterns was used to analyze the seismicity of Mount Merapi (Indonesia), classification errors of 6% and 26% were obtained for volcano-tectonic and rockfall events, respectively. However, when both events were combined into one cluster class, the error value was significantly reduced to 12%. In Reyes and Mosquera (2017), SOM and k-means models were used to classify volcanic signals recorded from the Tungurahua volcano (Ecuador), attaining accuracy (ACC) values of 91% and 87% for noise and infra-sound signals, respectively. In Anzieta et al. (2019), a k-means was used as part of a two-stage procedure, which initially clusters possible low-frequency seismic families on the spectral density vector of the signals. Then, the final separation by their waveform was based on the applying Correntropy and Dynamic Time Warping methods. In Messina and Langer (2011), SOM and clustering-based models were integrated to build the “KKAnalysis software”, a tool that takes less than a minute to classify volcanic tremor data related to the Mount Etna (Italy), reaching an ACC value of 90%. SOMs were also used to characterize the volcanic tremor preceding phreatic explosions at Raoul Island (Carniel et al., 2013b), Ruapehu (Carniel et al., 2013a), and Tongariro (Jolly et al., 2014) in New Zealand. Similarly, in Esposito et al. (2008) a SOM model was used to cluster the waveforms of very-long-period events associated with explosive activity at the Stromboli volcano. Despite the several developed approaches, the problem of volcano seismic event classification remains an open challenge since there is a great variety of seismic signatures associated to a given volcano and that their intraclass properties and characteristics evolve over time, for example during inactive and eruptive periods (Malfante et al., 2018). Furthermore, seismic waveforms can be overlapped and contain signals of non-volcanic origin such as icequakes and rockfalls that could occur at the same time or immediately after the occurrence of volcano seismic events.
Previous studies related to the automatic analysis (detection and classification) of volcano-seismic signals from Cotopaxi volcano have mainly focused on the use of supervised machine learning approaches as summarized in Pérez et al., 2020a, Pérez et al., 2020b, and to a lesser extent semi-supervised schemes (Brusil et al., 2019), but to the authors' knowledge, unsupervised techniques have not been extensively investigated for events classification so far.
In this work, we aim to explore six different clustering-based classifiers in the context of volcano seismic events classification related to the Cotopaxi volcano. Since the employed models belong to the unsupervised learning category, they have the advantage of being trained without knowing the seismic event type or class (output label) of the input instances, making it a practical solution for a real-life scenario, especially when no previous labeled data is available, e.g., for a volcano without recent previous unrest. But, at the same time, they are often less accurate, which is an inherent drawback.
The remainder of this paper is organized as follows: the Materials and Methods section presents the experimental volcano seismic event dataset, the selected clustering-based classifiers and the experimental setup design used in this work. The Results and Discussion section presents an exploratory comparison for the considered classifiers. Finally, Conclusions and future work are drawn in the last section.
Section snippets
Experimental dataset
For this work, a dataset with 668 LP and VT events was used. Each event is described by an 84-dimensional feature vector, including 13 features from time-domain, 21 features from frequency-domain, and 50 features from scale-domain (see Appendix Table 4). Detailed information about these features and their calculation can be found in (Pérez et al., 2020a, Pérez et al., 2020b). Most of these events are a subset of the publicly available dataset SeisBenchV1 from the ESeismic repository (Pérez et
Results and Discussion
According to the experimental setup section, a total of 54 clustering-based models (from k = 2 to 10) were evaluated on the experimental dataset, which contains 668 feature vectors. Since this dataset is formed by two seismic events classes with some of them having overlapped signals, we conduct the results presentation and discussion only for k = 2 and k = 3 clusters. By analyzing the whole classification performance space, we detected that does not make sense to investigate more than three
Conclusions and Future Work
In this work, we made an exploration of six different unsupervised learning classifiers within the context of volcano seismic events classification. We used the WbACC metric to carry out the classifiers assessment due to the particular configuration of the employed experimental dataset, which contains LP and VT seismic events with and without overlapped signals. According to the obtained results, the BIRCH method with k = 2 was chosen as the best model for clustering the LP and VT seismic event
CRediT author statement
Adrian Duque: Software, Investigation, Writing - Original draft preparation. Kevin González: Software, Investigation, Writing - Original draft preparation. Noel Pérez: Conceptualization, Methodology, Writing - Original draft preparation, Writing - Reviewing and Editing, Investigation, Formal analysis, Supervision. Diego Benítez: Writing - Reviewing and Editing, Investigation, Supervision. Funding acquisition, Project administration, Resources. Felipe Grijalva: Visualization, Writing - Reviewing
Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgment
This work was supported in part by the Universidad San Francisco de Quito (USFQ) through the Poli-Grants Program under Grant 10100, Grant 12494, and Grant 16916, in part by the Universidad de las Fuerzas Armadas (ESPE) under Grants 2013-PIT-014 and 2016-EXT-038, and in part by Escuela Politécnica Nacional (EPN) under the Grant PIE-SRASI-IG-2018. The authors thank the Applied Signal Processing and Machine Learning Research Group of USFQ, for providing the computing infrastructure (NVidia DGX
References (80)
- et al.
A geophysical multi-parametric analysis of hydrothermal activity at dallol, Ethiopia
J. Afr. Earth Sci.
(2010) - et al.
Analysis of phreatic events at ruapehu volcano, New Zealand using a new som approach
J. Volcanol. Geotherm. Res.
(2013) - et al.
Pattern recognition applied to seismic signals of the llaima volcano (Chile): an analysis of the events’ features
J. Volcanol. Geotherm. Res.
(2014) - et al.
Multivariate stochastic modelling: towards forecasts of paroxysmal phases at stromboli
J. Volcanol. Geotherm. Res.
(2003) - et al.
Devin: a forecasting approach using stochastic methods applied to the Soufriere hills volcano
J. Volcanol. Geotherm. Res.
(2006) - et al.
Seismo-acoustic evidence for an avalanche driven phreatic eruption through a beheaded hydrothermal system: an example from the 2012 tongariro eruption
J. Volcanol. Geotherm. Res.
(2014) - et al.
Pattern recognition of volcanic tremor data on mt. Etna (Italy) with kkanalysis–software program for unsupervised classification
Comput. Geosci.
(2011) - et al.
Chapter 18 - model complexity (and how ensembles help)
- et al.
Eseismic: towards an ecuadorian volcano seismic repository
J. Volcanol. Geotherm. Res.
(2020) - et al.
Global volcanic unrest in the 21st century: an analysis of the first decade
J. Volcanol. Geotherm. Res.
(2013)
Course 16 - the emergence of relevant data representations: an information theoretic approach
Principal component analysis vs. self-organizing maps combined with hierarchical clustering for pattern recognition in volcano seismic spectra
J. Volcanol. Geotherm. Res.
Signal classification by wavelet-based hidden markov models: application to seismic signals of volcanic origin
Statistics in Volcanology
A clustering algorithm for multivariate data streams with correlated components
Journal of Big Data
Finding possible precursors for the 2015 Cotopaxi volcano eruption using unsupervised machine learning techniques
International Journal of Geophysics
Support vector machines and mlp for automatic classification of seismic signals at stromboli volcano
k-means++: The Advantages of Careful Seeding
Identifying volcanic regimes using hidden markov models
Geophys. J. Int.
Continuous hmm-based seismic-event classification at deception island, Antarctica
IEEE Trans. Geosci. Remote Sens.
Clustering with BFR
Scaling clustering algorithms to large databases
A semi-supervised approach for microseisms classification from Cotopaxi volcano
Characterization of volcanic regimes and identification of significant transitions using geophysical data: a review
Bull. Volcanol.
Detecting dynamical regimes by self-organizing map (som) analysis: an example from the march 2006 phreatic eruption at raoul island
new zealand kermadec arc. Bollettino di Geofisica Teorica ed Applicata
Entropy-based algorithms for best basis selection
IEEE Trans. Inf. Theory
Standardization of noisy volcanoseismic waveforms as a key step toward station-independent, robust automatic recognition
Seismol. Res. Lett.
Multiscale entropy analysis of biological signals
Phys. Rev. E
Multifractality in local geomagnetic field at Etna volcano, Sicily (southern Italy)
Natural Hazards and Earth System Science
Revisiting bfr clustering algorithm for large scale gene regulatory network reconstruction using mapreduce
Discrimination of earthquakes and underwater explosions using neural networks
Bull. Seismol. Soc. Am.
Maximum likelihood from incomplete data via the em algorithm
J. R. Stat. Soc. Ser. B Methodol.
Automatic discrimination among landslide, explosion-quake, and microtremor seismic signals at stromboli volcano using neural networks
Bull. Seismol. Soc. Am.
Unsupervised neural analysis of very-long-period events at stromboli volcano using the self-organizing maps
Bull. Seismol. Soc. Am.
Geochemical precursors for eruption repose length
Geophys. J. Int.
Cure: an efficient clustering algorithm for large databases
ACM SIGMOD Rec.
A seismic-event spotting system for volcano fast-response systems
Bull. Seismol. Soc. Am.
Data mining: concepts and techniques. San Francisco [u.a.]: Kaufmann
Min max normalization based data perturbation method for privacy protection
International Journal of Computer & Communication Technology
The latest research progress on spectral clustering
Neural Comput. & Applic.
Unsupervised pattern recognition in continuous seismic wavefield records using self-organizing maps
Geophys. J. Int.
Cited by (15)
Learning feature representations from unlabeled data for volcano-seismic event classification
2024, Journal of Volcanology and Geothermal ResearchClusters of long-period (LP) seismic events recorded during 34 days at the Irazú volcano: What are they telling us?
2023, Journal of South American Earth SciencesCitation Excerpt :It is important to note that one of the most difficult issues in this process is the choice of number of clusters into which the data should be divided; this number in most of the cases has in fact to be fixed a priori before running the code (Carniel and Guzmán, 2020). An example of the application of six of these different unsupervised clustering-based methods is the classification of volcanic seismic events recorded at Cotopaxi volcano (Duque et al., 2020). For the purpose of this work, Kemal Eren's “Biclustering” program was used (Eren et al., 2013), in particular applying the “Spectral Biclustering” algorithm.
On finding possible frequencies for recognizing microearthquakes at Cotopaxi volcano: A machine learning based approach
2020, Journal of Volcanology and Geothermal ResearchCitation Excerpt :In their feature extraction stages, most of the previous works deal with a well-known set of features (quantities derived from inputs (Bishop et al., 2006)) related to the microearthquakes in both time and frequency domains (Cárdenas-Peña et al., 2013; Álvarez et al., 2012). Moreover, there are also studies using together features from time, frequency, and scale domains (Soto et al., 2018; Lara-Cueva et al., 2016b; Lara-Cueva et al., 2015; Duque et al., 2020), and recently, the intensity statistics, shape, and texture features computed from the seismic event pattern represented in the grey-level spectrogram image (Pérez et al., 2020a). Spectrum analysis based on the power spectral density (PSD) has also been used to analyze and classify microearthquakes (Carniel, 2014), and is a common practice for event classification at Instituto Geofísico of Escuela Politécnica Nacional (IGEPN) and other volcano observatories around the world (Chouet et al., 1994; Wassermann, 2012; Ruiz et al., 1998), defining a different frequency band (fb) for each type of microearthquakes.
Tremor clustering reveals pre-eruptive signals and evolution of the 2021 Geldingadalir eruption of the Fagradalsfjall Fires, Iceland
2024, Communications Earth and EnvironmentSeismic Event Detection in the Copahue Volcano Based on Machine Learning: Towards an On-the-Edge Implementation
2024, Electronics (Switzerland)