Exploring the unsupervised classification of seismic events of Cotopaxi volcano

https://doi.org/10.1016/j.jvolgeores.2020.107009Get rights and content

Highlights

  • Unsupervised clustering of Cotopaxi volcanic activity

  • Exploration of six unsupervised methods for clustering long-period and volcanotectonic seismic events

  • Detection of unpure seismic events with overlapped signals of non-volcanic origin

  • BIRCH method stands as the best on an 84-dimensional feature space.

Abstract

This paper explores the use of six different clustering-based methods to classify long-period and volcano-tectonic seismic events and to find possible overlapping signals of non-volcanic origin that could occur at the same time or immediately after the occurrence of volcano-seismic events. According to the explored classifiers space, the BIRCH method with k = 2 was chosen as the best model in the classification of both pure seismic events, reaching a weighted balanced accuracy and accuracy scores of 0.81 and 0.88, respectively. The accuracy result represents a satisfactory and competitive classification performance when compared to the state of the art methods. Besides, the spectral-clustering method with k = 3 was able to classify seismic events with and without overlapped signals of non-volcanic origin, attaining a weighted balanced accuracy score of 0.51. This result was at least 0.18 units higher than the other classifiers. Additionally, the obtained true positive rates of 0.94 corroborated the excellent performance of this classifier to detect seismic events with overlapping. According to the obtained results, it is possible to state that the proposed clustering-based exploration was effective in providing competitive models for both the classification of uncontaminated seismic events as well as for the detection of seismic events with overlapped signals.

Introduction

Hazards associated with volcanic eruptions such as lava flows, pyroclastic flows, debris flows, landslides, tephra, and ash are potential life-threatening for the population of cities close to volcanoes. Examples of such cities are Mexico City near Popocatepetl volcano, Tokyo near Mt. Fuji, Quito near Cotopaxi and Guagua Pichincha volcanoes, among others (Siebert et al., 2011; Phillipson et al., 2013; Pérez et al., 2020a, Pérez et al., 2020b). Although volcanic eruptions are often unpredictable, volcanic observatories use monitoring techniques to estimate the probability of unrest and/or an eruption (Schmincke, 2004). Several kinds of data can be used, including geomagnetic and electromagnetic (Currenti et al., 2005), geochemical (Green et al., 2013), deformation (Segall, 2013), infrasonic (Marchetti et al., 2019), thermal from satellite (Marchese et al., 2006) and from the ground (Carniel et al., 2010) in order to support early warnings of eruptions, but ideally a combination of them (e.g., Surono et al., 2012). However, for sure seismic data at the heart of any monitoring system should include both the analysis of continuous volcanic tremor (Carniel, 2014), due to its potential in terms of persistence and memory (Jaquet and Carniel, 2003; Jaquet et al., 2006) and the time evolution of discrete volcano-seismic events such as volcano-tectonic (VT) earthquakes, long-period (LP) events, tremors (TRE), and explosions (EXP). VT are earthquakes taking place in a volcanic environment associated with pressure changes induced by magma movement (McNutt, 2005; Malfante et al., 2018). They have a variable time duration that is usually less than 30 s and a broad spectral content that is typically above 5 Hz (Lara-Cueva et al., 2016b). LP events are low-frequency events, with a typical time duration below 90 s, with main spectral content limited at narrow frequency bands between 2 and 5 Hz (Lara-Cueva et al., 2016b). They are related to resonating fluid-filled conduits or cavities induced by pressure transients in the fluid (McNutt, 2005; Malfante et al., 2018). EXP quakes are generated by sudden magma, ash, and gas extrusion. Meanwhile, TRE events are produced by long-duration release of energy-related to magma degassing. They are denoted by their constant amplitude and long duration, ranging from a few minutes to several days (McNutt, 2005; Malfante et al., 2018; Lara-Cueva et al., 2016b). However, most of the observatories rely on manual classification and counting of seismic events which could lead to delays and errors due to human subjectivity. Moreover, the number of seismic events becomes often unbearable to manually classify exactly when it is most needed, i.e., during the unrest.

In this sense, machine learning classifiers with supervised or unsupervised learning have been employed during the last decade to different application contexts. Successfully supervised learning approaches used to tackle the problem of seismic events classification include artificial neural networks (ANN) (Parihar et al., 2018; Lara-Cueva et al., 2016a; Titos et al., 2018b; Esposito et al., 2006; Del Pezzo et al., 2003), random forest (Rodgers et al., 2016), hidden Markov models (Benitez et al., 2007; Alasonati et al., 2006), Gaussian mixture models (Venegas et al., 2019), support vector machines (Parihar et al., 2018; Curilem et al., 2014; Malfante et al., 2018; Apolloni et al., 2009), maximum-likelihood, k-nearest neighbor methods (Parihar et al., 2018), and lately, deep learning models (Titos et al., 2018a). The main problem with supervised classification is the availability of a reliable labeled dataset to be used for training and the fact that resulting classifications are difficult to generalize to be applicable to more than one volcano (Cortés et al., 2019). On the other hand, unsupervised learning methods, which have been applied to different problems as well (Maset et al., 2015), intend to form structured groups or clusters in datasets without prior knowledge of any class labels (Zheng et al., 2017). Some studies reported in the literature include: principal component analysis (PCA) (Unglert et al., 2016), mixtures of Gaussian (Hammer et al., 2012), hidden Markov models (Bebbington, 2007), cluster analysis (Langer et al., 2009) and self-organizing map (SOM) (Kuyuk et al., 2011; Langer et al., 2009).

Approaches focusing on volcanoes and their seismic activities have been less explored, but SOM models seem to be the most popular. In Köhler et al. (2010), a SOM model focused on volcanic wavefield patterns was used to analyze the seismicity of Mount Merapi (Indonesia), classification errors of 6% and 26% were obtained for volcano-tectonic and rockfall events, respectively. However, when both events were combined into one cluster class, the error value was significantly reduced to 12%. In Reyes and Mosquera (2017), SOM and k-means models were used to classify volcanic signals recorded from the Tungurahua volcano (Ecuador), attaining accuracy (ACC) values of 91% and 87% for noise and infra-sound signals, respectively. In Anzieta et al. (2019), a k-means was used as part of a two-stage procedure, which initially clusters possible low-frequency seismic families on the spectral density vector of the signals. Then, the final separation by their waveform was based on the applying Correntropy and Dynamic Time Warping methods. In Messina and Langer (2011), SOM and clustering-based models were integrated to build the “KKAnalysis software”, a tool that takes less than a minute to classify volcanic tremor data related to the Mount Etna (Italy), reaching an ACC value of 90%. SOMs were also used to characterize the volcanic tremor preceding phreatic explosions at Raoul Island (Carniel et al., 2013b), Ruapehu (Carniel et al., 2013a), and Tongariro (Jolly et al., 2014) in New Zealand. Similarly, in Esposito et al. (2008) a SOM model was used to cluster the waveforms of very-long-period events associated with explosive activity at the Stromboli volcano. Despite the several developed approaches, the problem of volcano seismic event classification remains an open challenge since there is a great variety of seismic signatures associated to a given volcano and that their intraclass properties and characteristics evolve over time, for example during inactive and eruptive periods (Malfante et al., 2018). Furthermore, seismic waveforms can be overlapped and contain signals of non-volcanic origin such as icequakes and rockfalls that could occur at the same time or immediately after the occurrence of volcano seismic events.

Previous studies related to the automatic analysis (detection and classification) of volcano-seismic signals from Cotopaxi volcano have mainly focused on the use of supervised machine learning approaches as summarized in Pérez et al., 2020a, Pérez et al., 2020b, and to a lesser extent semi-supervised schemes (Brusil et al., 2019), but to the authors' knowledge, unsupervised techniques have not been extensively investigated for events classification so far.

In this work, we aim to explore six different clustering-based classifiers in the context of volcano seismic events classification related to the Cotopaxi volcano. Since the employed models belong to the unsupervised learning category, they have the advantage of being trained without knowing the seismic event type or class (output label) of the input instances, making it a practical solution for a real-life scenario, especially when no previous labeled data is available, e.g., for a volcano without recent previous unrest. But, at the same time, they are often less accurate, which is an inherent drawback.

The remainder of this paper is organized as follows: the Materials and Methods section presents the experimental volcano seismic event dataset, the selected clustering-based classifiers and the experimental setup design used in this work. The Results and Discussion section presents an exploratory comparison for the considered classifiers. Finally, Conclusions and future work are drawn in the last section.

Section snippets

Experimental dataset

For this work, a dataset with 668 LP and VT events was used. Each event is described by an 84-dimensional feature vector, including 13 features from time-domain, 21 features from frequency-domain, and 50 features from scale-domain (see Appendix Table 4). Detailed information about these features and their calculation can be found in (Pérez et al., 2020a, Pérez et al., 2020b). Most of these events are a subset of the publicly available dataset SeisBenchV1 from the ESeismic repository (Pérez et

Results and Discussion

According to the experimental setup section, a total of 54 clustering-based models (from k = 2 to 10) were evaluated on the experimental dataset, which contains 668 feature vectors. Since this dataset is formed by two seismic events classes with some of them having overlapped signals, we conduct the results presentation and discussion only for k = 2 and k = 3 clusters. By analyzing the whole classification performance space, we detected that does not make sense to investigate more than three

Conclusions and Future Work

In this work, we made an exploration of six different unsupervised learning classifiers within the context of volcano seismic events classification. We used the WbACC metric to carry out the classifiers assessment due to the particular configuration of the employed experimental dataset, which contains LP and VT seismic events with and without overlapped signals. According to the obtained results, the BIRCH method with k = 2 was chosen as the best model for clustering the LP and VT seismic event

CRediT author statement

Adrian Duque: Software, Investigation, Writing - Original draft preparation. Kevin González: Software, Investigation, Writing - Original draft preparation. Noel Pérez: Conceptualization, Methodology, Writing - Original draft preparation, Writing - Reviewing and Editing, Investigation, Formal analysis, Supervision. Diego Benítez: Writing - Reviewing and Editing, Investigation, Supervision. Funding acquisition, Project administration, Resources. Felipe Grijalva: Visualization, Writing - Reviewing

Declaration of competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgment

This work was supported in part by the Universidad San Francisco de Quito (USFQ) through the Poli-Grants Program under Grant 10100, Grant 12494, and Grant 16916, in part by the Universidad de las Fuerzas Armadas (ESPE) under Grants 2013-PIT-014 and 2016-EXT-038, and in part by Escuela Politécnica Nacional (EPN) under the Grant PIE-SRASI-IG-2018. The authors thank the Applied Signal Processing and Machine Learning Research Group of USFQ, for providing the computing infrastructure (NVidia DGX

References (80)

  • N. Tishby

    Course 16 - the emergence of relevant data representations: an information theoretic approach

  • K. Unglert et al.

    Principal component analysis vs. self-organizing maps combined with hierarchical clustering for pattern recognition in volcano seismic spectra

    J. Volcanol. Geotherm. Res.

    (2016)
  • P. Alasonati et al.

    Signal classification by wavelet-based hidden markov models: application to seismic signals of volcanic origin

    Statistics in Volcanology

    (2006)
  • G. Aletti et al.

    A clustering algorithm for multivariate data streams with correlated components

    Journal of Big Data

    (2017)
  • J.C. Anzieta et al.

    Finding possible precursors for the 2015 Cotopaxi volcano eruption using unsupervised machine learning techniques

    International Journal of Geophysics

    (2019)
  • B. Apolloni

    Support vector machines and mlp for automatic classification of seismic signals at stromboli volcano

  • D. Arthur et al.

    k-means++: The Advantages of Careful Seeding

    (2006)
  • M.S. Bebbington

    Identifying volcanic regimes using hidden markov models

    Geophys. J. Int.

    (2007)
  • M.C. Benitez et al.

    Continuous hmm-based seismic-event classification at deception island, Antarctica

    IEEE Trans. Geosci. Remote Sens.

    (2007)
  • J. Berglund

    Clustering with BFR

  • P.S. Bradley et al.

    Scaling clustering algorithms to large databases

  • C. Brusil et al.

    A semi-supervised approach for microseisms classification from Cotopaxi volcano

  • R. Carniel

    Characterization of volcanic regimes and identification of significant transitions using geophysical data: a review

    Bull. Volcanol.

    (2014)
  • R. Carniel et al.

    Detecting dynamical regimes by self-organizing map (som) analysis: an example from the march 2006 phreatic eruption at raoul island

    new zealand kermadec arc. Bollettino di Geofisica Teorica ed Applicata

    (2013)
  • R.R. Coifman et al.

    Entropy-based algorithms for best basis selection

    IEEE Trans. Inf. Theory

    (1992)
  • G. Cortés et al.

    Standardization of noisy volcanoseismic waveforms as a key step toward station-independent, robust automatic recognition

    Seismol. Res. Lett.

    (2019)
  • M. Costa et al.

    Multiscale entropy analysis of biological signals

    Phys. Rev. E

    (2005)
  • G. Currenti et al.

    Multifractality in local geomagnetic field at Etna volcano, Sicily (southern Italy)

    Natural Hazards and Earth System Science

    (2005)
  • M. Daoudi et al.

    Revisiting bfr clustering algorithm for large scale gene regulatory network reconstruction using mapreduce

  • E. Del Pezzo et al.

    Discrimination of earthquakes and underwater explosions using neural networks

    Bull. Seismol. Soc. Am.

    (2003)
  • A.P. Dempster et al.

    Maximum likelihood from incomplete data via the em algorithm

    J. R. Stat. Soc. Ser. B Methodol.

    (1977)
  • A. Esposito et al.

    Automatic discrimination among landslide, explosion-quake, and microtremor seismic signals at stromboli volcano using neural networks

    Bull. Seismol. Soc. Am.

    (2006)
  • A.M. Esposito et al.

    Unsupervised neural analysis of very-long-period events at stromboli volcano using the self-organizing maps

    Bull. Seismol. Soc. Am.

    (2008)
  • R.M. Green et al.

    Geochemical precursors for eruption repose length

    Geophys. J. Int.

    (2013)
  • S. Guha et al.

    Cure: an efficient clustering algorithm for large databases

    ACM SIGMOD Rec.

    (1998)
  • C. Hammer et al.

    A seismic-event spotting system for volcano fast-response systems

    Bull. Seismol. Soc. Am.

    (2012)
  • J. Han et al.

    Data mining: concepts and techniques. San Francisco [u.a.]: Kaufmann

  • Y.K. Jain et al.

    Min max normalization based data perturbation method for privacy protection

    International Journal of Computer & Communication Technology

    (2011)
  • H. Jia et al.

    The latest research progress on spectral clustering

    Neural Comput. & Applic.

    (2014)
  • A. Köhler et al.

    Unsupervised pattern recognition in continuous seismic wavefield records using self-organizing maps

    Geophys. J. Int.

    (2010)
  • Cited by (15)

    • Clusters of long-period (LP) seismic events recorded during 34 days at the Irazú volcano: What are they telling us?

      2023, Journal of South American Earth Sciences
      Citation Excerpt :

      It is important to note that one of the most difficult issues in this process is the choice of number of clusters into which the data should be divided; this number in most of the cases has in fact to be fixed a priori before running the code (Carniel and Guzmán, 2020). An example of the application of six of these different unsupervised clustering-based methods is the classification of volcanic seismic events recorded at Cotopaxi volcano (Duque et al., 2020). For the purpose of this work, Kemal Eren's “Biclustering” program was used (Eren et al., 2013), in particular applying the “Spectral Biclustering” algorithm.

    • On finding possible frequencies for recognizing microearthquakes at Cotopaxi volcano: A machine learning based approach

      2020, Journal of Volcanology and Geothermal Research
      Citation Excerpt :

      In their feature extraction stages, most of the previous works deal with a well-known set of features (quantities derived from inputs (Bishop et al., 2006)) related to the microearthquakes in both time and frequency domains (Cárdenas-Peña et al., 2013; Álvarez et al., 2012). Moreover, there are also studies using together features from time, frequency, and scale domains (Soto et al., 2018; Lara-Cueva et al., 2016b; Lara-Cueva et al., 2015; Duque et al., 2020), and recently, the intensity statistics, shape, and texture features computed from the seismic event pattern represented in the grey-level spectrogram image (Pérez et al., 2020a). Spectrum analysis based on the power spectral density (PSD) has also been used to analyze and classify microearthquakes (Carniel, 2014), and is a common practice for event classification at Instituto Geofísico of Escuela Politécnica Nacional (IGEPN) and other volcano observatories around the world (Chouet et al., 1994; Wassermann, 2012; Ruiz et al., 1998), defining a different frequency band (fb) for each type of microearthquakes.

    View all citing articles on Scopus
    View full text