Coal elemental (compositional) data analysis with hierarchical clustering algorithms

doi:10.1016/j.coal.2021.103892

International Journal of Coal Geology

Volume 249, 1 January 2022, 103892

https://doi.org/10.1016/j.coal.2021.103892 Get rights and content

Abstract

The modes of occurrence for elements in coal are extremely important for deciphering geological process of coal formation and for anticipating the technological behavior and environmental and health impacts derived from coal utilization. Hierarchical clustering algorithm has been widely adopted to investigate the modes of occurrence of elements in coal. The traditional statistics (e.g., Pearson correlation, Euclidean distance) for the elemental data of coal may lead to misinterpretation because the elemental data of coal are of compositional nature and follow the rules of Aitchison geometry. This work applied log-ratio transformations in order to overcome this problem. Different hierarchical clustering algorithms with various data transformations can infer modes of occurrence for coal elements, but which algorithm is optimum deserves to be investigated. In this paper, we discuss four commonly used hierarchical clustering algorithms utilizing pivot coordinates and weighted symmetric pivot coordinates (WSPC), two types of log-ratio transformations, to infer modes of occurrence of elements in coal, based on published coal elemental data. Results showed that the Pearson correlation produces more meaningful results than the Euclidean distance in clustering rare earth elements and Y. WSPC produces more interpretable results than those from pivot coordinates transformed data for these coal elemental data. Compared with the single, complete, and centroid, the average-linkage algorithm is indeed the optimum.

Introduction

Coal is an important resource in many countries around the world, particularly those of developing countries such as China, India, and Turkey. Coal is composed primarily of organic matter with up to 50 wt% inorganic components, and the latter is usually referred to as mineral matter (Ward, 2002, Ward, 2016). Geochemically, mineral matter in coal mainly consists of non-mineral elements (i.e., elements bound by organic matter, adsorbed on the surfaces of organics, and dissolved in pore waters) and elements hosted in minerals (Dai et al., 2020b, Dai et al., 2021; Finkelman et al., 2019; Ward, 2002, Ward, 2016). The environmental and health impacts from potentially toxic elements in coal are not only determined by their concentrations but also by their modes of occurrence (Dai et al., 2020b, Dai et al., 2021; Finkelman and Greb, 2008). The accurate determination of the modes of the occurrence for elements in coal is important for deciphering geological process of coal formation and anticipating the technological behavior, economic by-product potential, and environmental and health impacts from coal utilization.

A number of physical and chemical methods have been used to determine the modes of occurrence of elements in coal (Dai et al., 2020a, Dai et al., 2021; Finkelman et al., 2019), including optical microscopy, X-ray diffraction analysis (XRD), scanning electron microscopy equipped with energy dispersive X-Ray spectroscopy (SEM-EDS), X-ray fluorescence spectrometry (XRF), sensitive high-resolution ion microprobe (SHRIMP), transmission electron microscopy (TEM), electron microprobe analyzer (EMPA), and laser ablation inductively coupled plasma mass spectrometry (LA ICP-MS). In addition, indirect methods, e.g., density separations and selective leaching procedures, have also been used (Dai et al., 2020a, Dai et al., 2021; Finkelman et al., 2019). Furthermore, several statistical methods have been widely adopted to investigate the modes of occurrence of elements in coal (e.g., Eskanazy et al., 2010; Geboy et al., 2013; Liu et al., 2019; Ward, 2002), although there are some controversies regarding this approach. For example, Drew et al. (2008) and Geboy et al. (2013) noted inconsistencies in correlations between elements for coal geochemical data reported on whole-coal and ash bases. Both studies reported that the root cause of the difference between bases is mathematical (i.e., subcompositional incoherence) and can be potentially solved using statistical methods based on compositional data analysis. Eskanazy et al. (2010) pointed out some potential problems using statistical analysis to determine the modes of occurrence of elements in coal and cautioned that careful consideration of geochemical principles must be seriously considered. Dai et al., 2020b, Dai et al., 2021 reviewed statistical analyses used in coal geochemistry such as principal component analysis, cluster analysis, and correlation analysis, and pointed out that statistical analysis is not always correct for deciphering the modes of occurrence of elements in coal.

Among all the statistical methods and clustering algorithms, hierarchical clustering is commonly used because it can represent the degree of affinity of the elements in coal with each other, but does not require relationships to be linear (Dai et al., 2008, Dai et al., 2012c; Dai et al., 2012a; Templ et al., 2008; Jain et al., 1999; Xu et al., 2020; Xu and Wunsch, 2005; Zhou and Jia, 2000). Specifically, hierarchical clustering aims to divide elements in coal into different clusters using different algorithms, based on the affinity of elemental data. Hierarchical clustering algorithms usually include single-, complete-, centroid-, and average-linkage (Jain et al., 1999; Xu et al., 2020; Xu and Wunsch, 2005). Common dissimilarity measurements include Pearson correlation and Euclidean distance (Jain et al., 1999; Xu et al., 2020; Xu and Wunsch, 2005). Among all the hierarchical clustering algorithms, the centroid-linkage hierarchical clustering algorithm is usually equipped with Euclidean distance (Jain et al., 1999; Xu et al., 2020; Xu and Wunsch, 2005). According to most literature on the hierarchical clustering algorithms for coal elemental data analysis, some specific hierarchical clustering algorithms are used for deducing the modes of occurrence. For example, Eskanazy et al. (2010) used Euclidean distance and centroid-linkage hierarchical clustering algorithm to infer elemental modes of occurrence from a suite of 75 samples from a Bulgarian lignite deposit. Dai et al. (2012c) analyzed 33 coal samples from the Adaohai coal mine in the Daqingshan Coalfield, Inner Mongolia, China, using Pearson correlation dissimilarity with a centroid-linkage hierarchical clustering algorithm, and they produced insights about geological processes that affected the modes of occurrence of elements in coal.

Elemental coal data are compositional (Geboy et al., 2013). Compositional data have been defined historically as random vectors with strictly positive components whose sum may be constant, though the latter is not a strict requirement. Compositional data exist in a hyperplane of real space (known as the Simplex) but do not follow the rules of Euclidean geometry, meaning that the geometric properties used in conventional statistics (e.g., distance and correlation) may provide incorrect or spurious results if applied to these data (Aitchison, 1986). Rather, the data follow the rules of Aitchison geometry and a series of compositional data analysis methods (Egozcue et al., 2003; Xu et al., 2020). In particular, log-ratio transformations have been proposed to either transform compositional data into real space or allow for correctly work with data in the Simplex. These log-ratio transformations include additive log-ratio transformation (alr) (Aitchison, 1986), centered log-ratio transformation (clr) (Aitchison, 1986), and orthonormal log-ratio transformation (olr) (Egozcue et al., 2003; Fišerová and Hron, 2011). The performance evaluation of compositional data transformations has been developed for many years especially through mathematical analysis (Aitchison, 1986; Aitchison et al., 2000; Egozcue et al., 2003). However, there are notable differences in transformed data and care must be taken in how log-ratio transformed data are utilized. For example, olr transformed compositional data exhibit orthonormal properties while alr- and clr-transformed data do not.

Another approach for constructing orthonormal coordinates is pivot coordinates, which can construct coordinates that contain only the compositional part of interest. One of the coordinates (such as the first coordinate) explains all the relevant information about that part through pairwise log ratios to the other parts of the composition. Hron et al. (2017) constructed weighted pivot coordinates that treat the redundant information in a controlled manner. Kynčlová et al. (2017) proposed symmetric pivot coordinates to measure the strength of association of compositional parts through the correlation coefficient of a particular choice of orthonormal coordinates. Hron et al. (2021) proposed weighted symmetric pivot coordinates (WSPC) focusing on pairwise associations. In the method of WSPC, variables with large log ratio variances are down-weighted to suppress their effects on the remaining variables. Compared to the weighted pivot coordinates and symmetric pivot coordinates, the method of weighted symmetric coordinates focuses on the pairwise associations.

Based on the analysis of different hierarchical clustering algorithms and different data transformations discussed above, it is concluded that different hierarchical clustering algorithms can yield different modes of occurrence for coal elements. This paper focuses on the performance evaluation of hierarchical clustering algorithms with the pivot coordinates and WSPC (Egozcue et al., 2003; Drew et al., 2008; Mateu-Figueras et al., 2011; Hron et al., 2021) for the coal elemental data and their associations, so as to determine which approach is optimum among all the different clustering techniques for the dataset being investigated.

Section snippets

Orthonormal log-ratio (olr) and Pivot Coordinates

The olr coordinates, previously referred to as isometric log-ratio coordinates, map the data from Simplex space x_i ∈ S to Euclidean space y_i ∈ R. The olr does this by building an orthonormal basis in the hyperplane and has the advantage of avoiding singularity which occurs with clr preprocessing coefficients (Filzmoser et al., 2009). One particular choice of a basis leads to pivot (logratio) coordinates (Filzmoser et al., 2018; Fišerová and Hron, 2011), which is defined as: $y^{(i)} = pi v ot coordinate (x^{(i)}) = \sqrt{n}$

Dissimilarity for the affinity of coal elemental data

Coal elemental concentration data can be represented as a vector x⁽ⁱ⁾ = [x₁⁽ⁱ⁾, x₂⁽ⁱ⁾, …, x_m⁽ⁱ⁾]^T, i = 1…n, where m and n are sample size and element number, respectively. The dissimilarity between concentrations of element x⁽ⁱ⁾ and element x^(j) is denoted as D(x⁽ⁱ⁾, x^(j)). Pearson correlation and Euclidean distance are widely used to measure the dissimilarity between two elements.

Different hierarchical clustering algorithms

The hierarchical clustering algorithms usually involve single, complete, centroid, and average-linkages.

(1)
Single-linkage

Background information of the coal datasets

For the interpretations of the different algorithm with different data transformation, the data used in this study are from late Paleozoic coals (i.e., CP2 coal) from the Adaohai and Datanhao mines in the Daqingshan Coalfield, Inner Mongolia, northern China (Fig. 1A). The Daqingshan Coalfield contains 16 mines (Fig. 1B). During the period of peat deposition, the Daqingshan Coalfield was close to the sediment source region, i.e., Yinshan Upland (Dai et al., 2012c, Dai et al., 2015). Due to

Conclusion

In this paper, we have conducted extensive experiments of applying hierarchical clustering algorithms to real datasets collected from Adaohai and Datanhao mines. Based on the comprehensive studies, the main conclusions can be drawn as follows:

(1)
The Pearson correlation is much better than the Euclidean distance in clustering REY. The correlation involves raw data, transforms data of pivot coordinates and WSPC.
(2)
In general, for the hierarchical clustering results with correlation, pivot coordinates

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This research was funded by the National Natural Science Foundation of China (No. 61772320) and 111 Projects (No. B17042). Thanks are given to the anonymous reviewers for their careful reviews and detailed comments.

References (36)

S. Dai et al.
Mineralogy and geochemistry of boehmite-rich coals: New insights from the Haerwusu Surface Mine, Jungar Coalfield, Inner Mongolia, China
Int. J. Coal Geol.
(2008)
S. Dai et al.
Mineralogical and geochemical compositions of the coal in the Guanbanwusu Mine, Inner Mongolia, China: further evidence for the existence of an Al (Ga and REE) ore deposit in the Jungar Coalfield
Int. J. Coal Geol.
(2012)
S. Dai et al.
Geochemistry of trace elements in Chinese coals: a review of abundances, genetic types, impacts on human health, and industrial utilization
Int. J. Coal Geol. Minerals and Trace Elements in Coal
(2012)
S. Dai et al.
Mineralogical and geochemical compositions of the Pennsylvanian coal in the Adaohai Mine, Daqingshan Coalfield, Inner Mongolia, China: Modes of occurrence and origin of diaspore, gorceixite, and ammonian illite
Int. J. Coal Geol.
(2012)
S. Dai et al.
Mineralogical and geochemical compositions of the Pennsylvanian coal in the Hailiushu Mine, Daqingshan Coalfield, Inner Mongolia, China: Implications of sediment-source region and acid hydrothermal solutions
Int. J. Coal Geol.
(2015)
S. Dai et al.
Modes of occurrence of elements in coal: a critical evaluation
Earth Sci. Rev.
(2021)
G. Eskanazy et al.
Some considerations concerning the use of correlation coefficients and cluster analysis in interpreting coal geochemistry data
Int. J. Coal Geol.
(2010)
P. Filzmoser et al.
Univariate statistical analysis of environmental (compositional) data: Problems and possibilities
Sci. Total Environ.
(2009)
R.B. Finkelman et al.
Chapter 10 - Environmental and Health Impacts
R.B. Finkelman et al.
The importance of minerals in coal as the hosts of chemical elements: a review
Int. J. Coal Geol.
(2019)

N.J. Geboy et al.

Whole-coal versus ash basis in coal geochemistry: a mathematical approach to consistent interpretations

Int. J. Coal Geol.

(2013)

M.P. Ketris et al.

Estimations of Clarkes for Carbonaceous biolithes: World averages for trace element contents in black shales and coals

Int. J. Coal Geol.

(2009)

J. Liu et al.

Mineralization of REE-Y-Nb-Ta-Zr-Hf in Wuchiapingian coals from the Liupanshui Coalfield, Guizhou, southwestern China: Geochemical evidence for terrigenous input

Ore Geol. Rev.

(2019)

M. Templ et al.

Cluster analysis applied to regional geochemical data: Problems and possibilities

Appl. Geochem.

(2008)

C.R. Ward

Analysis and significance of mineral matter in coal seams

Int. J. Coal Geol.

(2002)

C.R. Ward

Analysis, origin and significance of mineral matter in coal: an updated review

Int. J. Coal Geol.

(2016)

L. Zhao et al.

Enrichment of critical elements (Nb-Ta-Zr-Hf-REE) within coal and host rocks from the Datanhao mine, Daqingshan Coalfield, northern China

Ore Geol. Rev.

(2019)

J. Aitchison

The Statistical Analysis of Compositional Data

(1986)

Cited by (20)

Signature characteristics of coal geochemistry from the Eocene Tanjung Formation and the Miocene Warukin Formation, Barito Basin: Insights into geological control on coal deposition and future critical element prospection
2024, International Journal of Coal Geology
In conjunction with implementing the Indonesian Act on coal downstream research and development, the geochemical characterization of the Barito Basin coals and an evaluation of the critical elements in coal and coal ashes compared with those found in conventional deposits must be fulfilled. This study documented the signature characteristics of the coal geochemistry in the Barito Basin. Coals from Tanjung Formation mainly comprise huminite macerals and inorganic constituents are mainly dominated by kaolinite, pyrite, quartz, carbonate minerals (i.e., calcite, Sr-bearing carbonate), Ti-oxide minerals (anatase or rutile), chlorite, and apatite. Meanwhile, the Warukin Fm is characterized by high contents of inertinite and inorganic materials comprising kaolinite, pyrite, and quartz. Moreover, the B3 seam from the Tanjung Fm shows normal slight enrichments for Hf, Zr, and V, demonstrating the highest REY content of up to 194.56 ppm. Meanwhile, coal samples from the Warukin Fm show depletion in all the trace elements and REY.
The provenance source of the Barito Basin coals is derived from epiclastic mafic–intermediate volcaniclastic rocks, including the Early Cretaceous Pitap and Haruyan Fm, which are mainly derived from low-Ti mafic magmas. Subsequently, marine water infiltration during peat accumulation is subjected to the influence of the geochemical characteristics of coals from the Eocene Tanjung Fm, showing the transgressive settings. On the other hand, the Miocene Warukin Fm. represents a non-marine regressive setting during peat accumulation. Furthermore, local geological controls in the Eocene Tanjung Fm. may lead some roof, floor, parting and coals from B3 seam to critical elements enrichment (such as REY, Zr, Hf, and V), and their affinity is related to Ti-oxide minerals, aluminosilicates, and organic matter.
Mineralogical and elemental composition of the Middle Miocene coal seams from the Alpu coalfield (Eskişehir, Central Türkiye): Insights from syngenetic zeolite formation
2024, International Journal of Coal Geology
This study focuses on determining mineralogical and elemental compositions of coal seams (to the upwards, D, C, B, A, and S0) within seven coal exploration wells from the Alpu coalfield (Eskişehir, Central Türkiye). Furthermore, the special goal of the study is a comparative analysis of the relations between the elements by using agglomerative hierarchical clustering algorithm with different linkage methods as well as different similarity measures. Clay minerals and quartz are commonly detected as abundant to dominant phases, while natural zeolite formations were detected in the studied seams C, B, A, and S0. The SEM-EDX data shows that clinoptilolites in zeolite minerals were observed within the organic matter, while crystalline and non-crystalline analcime minerals along with syngenetic authigenic rhomboid K-feldspars were only detected in the seam A from one studied well. The existence of some micron-sized minerals such as apatite, monazite, and Ti-oxides within the smectite matrix and the measurable amount of Ti in smectite imply that alteration of epiclastic and contemporaneous volcanic inputs was developed under weak acidic to neutral conditions during peat accumulation. The lack of natural zeolite and carbonate minerals in the seam D could be an indicator of weak acidic to neutral conditions and semi-closed hydrogeological conditions. Nevertheless, the alkalinity of mire water, water table, and hydrogeological regime seem to be variable during the accumulation of precursor peats of seams C, B, A, and S0. In turn, alteration of volcanic inputs was observed under neutral to weak alkaline conditions and semi-closed to closed hydrogeological regime. Hence, syngenetic authigenic, micron-sized clinoptilolites were formed. Moreover, the existence of authigenic rhomboid K-feldspars and syngenetic authigenic analcimes in certain exploration well could suggest local increases on dissolved Na⁺ concentrations, alkalinity, and water table. Except for volcanogenic origin for minerals, accessory micron-sized minerals, like chromite, pentlandite, and allanite grains presumably originated from clastic influxes of ophiolitic rocks in the basement into palaeomires. The variations in mire water chemistry and clastic-influx source area could also control the elemental enrichments in the studied seams. Epiclastic and contemporaneous volcanic inputs into palaeomires seem to control enrichments of Li, B, Sc, and Ti in coal samples, while clastic influx from ophiolitic rocks into palaeomires caused to enrichments of Cr, V and Ni. Furthermore, the liberated Ba, Sr, and As ions from the alteration of epiclastic and contemporaneous volcanic inputs are absorbed by syngenetic zeolite minerals, while anoxic conditions in the palaeomires resulted in precipitation of Sr-barite and As-bearing pyrite grains during peat accumulation and/or early diagenetic stages. Overall, the differences in water chemistry of mire water, epiclastic and contemporaneous volcanic inputs, and clastic influx from the adjacent areas also caused several elemental enrichments and variations in mineralogical compositions of the Middle Miocene coal seams in the Alpu coalfield.
Source of inorganic components in the Middle Jurassic inertinite-rich coals of the Southern Ordos Basin, China: With emphasis on formation of Sr- and Ba-bearing minerals
2023, Ore Geology Reviews
The Middle Jurassic coal seams located in the southern Ordos Basin display notably high concentrations of barium (Ba) and strontium (Sr). To acquire a comprehensive understanding of the source and occurrence of the elevated contents of Ba- and Sr-bearing minerals and their formation mechanism in the coal, a suite of petrological, mineralogical, and geochemical analyses was conducted on the Middle Jurassic coals of the Gaojiapu mine (No. 4 Coal) from the Binchang mining area, using optical microscopy, field emission-scanning electron microscopy, X-ray diffraction, X-ray fluorescence, and inductively coupled plasma mass spectrometry. The results indicate that the No. 4 Coal is a high volatile B bituminous coal, which is characterized by low ash, medium sulfur, and high inertinite contents. The modes of occurrence of minerals identified in the coal, including calcite, dolomite, pyrite, quartz, kaolinite, anatase, barite, and celestine, indicate that the No. 4 Coal may have been subjected to multiple stages of solutions injection. Barium and Sr in the investigated coals are significantly enriched, exhibiting maximum values of 19738 μg/g and 9932 μg/g, respectively, while concentrations of other trace elements are generally depleted. Barite and celestine, generally coexisting with carbonate minerals, commonly occur as cell- and fracture-fillings. Furthermore, the strong correlation coefficients (r) for Ba vs. Ca + Mg and Sr vs. Ca + Mg (0.78 and 0.61, respectively) indicate that most barite and celestine have a close association with carbonate minerals of epigenetic origin. The highly elevated concentrations of Ba and Sr are thought to be primarily derived from solutions leaching barite deposits developed in the Qinling Orogenic Belt, which is located in the south of the Ordos Basin. When mapped spatially, the distributions of Ba and Sr concentrations are fan-shaped, decreasing from the NW to the SE of the mining area. Meteoric water containing dissolved CO₂ induced partial dissolution of carbonate rocks, and during this process, selected metal ions (e.g., Ca²⁺, Mg²⁺, Al³⁺, Si²⁺, Fe²⁺, Ba²⁺, and Sr²⁺) were redistributed and subsequently migrated along fracture zones to the coal seam. The fracture systems and abundant cavities within inertinite in the coal provided sufficient space and conduits for solutions migration.
MAP-FCRNN: Multi-step ahead prediction model using forecasting correction and RNN model with memory functions
2023, Information Sciences
Currently, prediction stands as one of the most prominent areas of research. Enhancing the accuracy and generalization capabilities of prediction models remains a crucial and ongoing challenge. Furthermore, the majority of existing prediction models suffer from the issue of error accumulation. Thus, we develop a multi-step time series prediction model that relies on prediction correction to address this problem. First, we mitigate the problem of excessive accumulation error by constructing a sample set in combination with the prediction target. Second, we employ a recurrent neural network (RNN) model with memory function to make initial predictions. Finally, building on the concept of prediction correction, we develop a new prediction model that effectively rectifies the initial prediction results. Remarkably, the model efficiently safeguards against deviations during prediction tasks. Additionally, our proposed model integrates a clustering algorithm during the data processing phase, which introduces a rule for sample selection. This rule ensures the inclusion of diverse types of data to enhance the prediction accuracy of the model. Notably, we conduct a comparative experimental analysis using eight publicly available data sets and evaluate our model against seven commonly used prediction models to demonstrate its effectiveness.
Application of self-organizing maps to coal elemental data
2023, International Journal of Coal Geology
Understanding the modes of occurrence of elements in coal is important, not only to help properly evaluate the impacts of potentially toxic elements on the environment and human health but also to provide technical guidance for recovering critical elements from coal ash. Statistical and multivariate data analysis methods have widely been used, together with physical and chemical methods, to determine the modes of occurrence of elements in coal. However, some of the statistical methods, e.g., average linkage hierarchical clustering algorithm have some disadvantages (e.g., statistical errors and poor visualization). A self-organizing map is an unsupervised artificial neural network, and it is known for its high data mining capability and excellent data visualization. In contrast to the average linkage hierarchical clustering algorithm that is commonly used for analyzing the modes of occurrence of elements in coal, the self-organizing map algorithm can provide a topological relationship among elements instead of merely providing the groups to which the elements belong. This paper focuses on the application of self-organizing map to coal elemental data for analyzing the modes of occurrence of elements in coal. Samples used in this study are from the Adaohai, Hailiushu, and Datanhao mines, all located in the Daqingshan Coalfield, Inner Mongolia, China. The results obtained from the self-organizing map algorithm are compared with those produced by average linkage hierarchical clustering algorithm. Based on the previous investigations (mainly direct methods) and further analysis, it can be concluded that the results from the self-organizing map algorithm in this investigation are more consistent with the geochemical nature and previous investigations by direct methods than those from average linkage hierarchical clustering algorithm. Consequently, the self-organizing map algorithm is a new reliable and intuitive method for analyzing the modes of occurrence of elements in coal.
Visualizing high dimensional structures in geochemical datasets using a combined compositional data analysis and Databionic swarm approach
2023, International Journal of Coal Geology
Classical tools for exploratory analysis of large geochemical datasets (e.g., cluster analysis, principal component analysis, etc.) have been successfully utilized for decades to understand complex structures and identify natural, multi-dimensional data clusters. More recent development of machine learning algorithms, both supervised and unsupervised, have increasingly been shown to derive deeper insights from complex datasets than older methods. Moreover, geochemists have long recognized that concentration data are compositional and require special mathematical treatment, such as application of Compositional Data Analysis (CoDA) techniques. Proper approaches for linking CoDA and machine learning is an area of active research. Here, we investigate the behavior of trace elements in coal and coal combustion products from a power plant burning Appalachian Basin coals using a combination of CoDa methods and a relatively new tool for visualizing complex multivariate structures, Databionic swarm (DBS). Databionic swarm is an unsupervised method which combines concepts from emergence, self-organization, game theory, and swarm intelligence, and allows for mapping of higher-order structures onto a lower-dimensional output space (2-dimensions in this case). A suggested approach for converting the raw geochemical data to isometric log-ratios is developed for the system and results from DBS and robust PCA are compared and contrasted.
Both PCA and DBS results show similar clustering of feed coal (FC) and pulverized coal (PC) samples and bottom ash (BA) with economizer fly ash (EFA) samples. However, within group variation of the samples was relatively difficult to identify in the PCA analysis. Results from DBS show that the BA and EFA samples are geochemically distinguishable based on the relative abundances of As, Cd, and Pb vs. Cr and Ni and As and Cd vs. Pb. Similarly, the PC and FC samples exhibit statistically different Cr vs. Ni ratios, which was not obvious from the PCA scores. Abnormally low Hg to Cl ratios in the fly ash (FA) samples relative to the rest of the samples, provided insight into the preferential removal of Hg vs. Cl, possibly in response to the generation of more gaseous oxidized mercury in the flue gas, owing to elevated Cl concentrations in the coal. Such findings demonstrate that DBS and other machine learning techniques allow for quick visualization of complex geochemical data structures in a way that is both consistent with the principals of CoDA and an improvement over classical methods.

View all citing articles on Scopus

View full text

Coal elemental (compositional) data analysis with hierarchical clustering algorithms

Abstract

Introduction

Section snippets

Orthonormal log-ratio (olr) and Pivot Coordinates

Dissimilarity for the affinity of coal elemental data

Different hierarchical clustering algorithms

Background information of the coal datasets

Conclusion

Declaration of Competing Interest

Acknowledgments

Int. J. Coal Geol.

Int. J. Coal Geol.

Int. J. Coal Geol. Minerals and Trace Elements in Coal

Int. J. Coal Geol.

Int. J. Coal Geol.

Earth Sci. Rev.

Int. J. Coal Geol.

Sci. Total Environ.

Int. J. Coal Geol.

Int. J. Coal Geol.

Int. J. Coal Geol.

Ore Geol. Rev.

Appl. Geochem.

Int. J. Coal Geol.

Int. J. Coal Geol.

Ore Geol. Rev.

The Statistical Analysis of Compositional Data