1 Introduction

Coal is the most widely used fossil fuel energy resource in the world since industrialization. In most of the countries, it continues to play an essential role in the production and supply of energy. Coal is heterogeneous in nature and formed from decomposed plant materials. It includes different constituents called macerals, only grouped by its specific course of action of physical properties, compound structure, and morphology. According to World Energy Council (WEC 2016), over 7800 million tons of coal are consumed by a variety of sectors like power generation, steel production, cement industries, etc. across the world. Furthermore, it was estimated that 40% of the world’s electricity generation is made from coal fuels and continue to play a major role over the next three decades (WEC 2016). Thus, the characterization of coal is essential before its use for a specific purpose. The characterization of coal can be divided into three separate categories, namely petrographic analysis, physical and mineralogical analysis, and structural analysis. This study focuses on physical and mineral characterizations of the coal that includes prediction of moisture content percentage (MC), ash percentage (Ash), volatile matter percentage (VM), and fixed carbon content (FC) in coal.

Coal can be divided into two categories, coking coal and non-coking coal based on the percentage of ash content and volatile matter. The quality of coking coal measured based on the ash content, whereas the quality of non-coking coal is measured based on its useful heating value (Ministry of Coal, GOI 2014). In steel industries, coking coal (also called metallurgical coal) with volatile carbon and maximum possible ash free is mainly used. That is, the coal with low ash and volatile matter contents and high carbon content are generally considered as coking coal. On the other hand, non-coking coal does not have any caking properties, and it is mainly used in the thermal station for power generation. In other words, the non-coking coals have high ash content and volatile matter with low carbon content, and it is used in industries like fertilizer, ceramic, cement, paper, chemical, glass, and brick manufacturing. Due to significant variations in coal properties and specific quality of coal requirement in different industries, the characterization of coal has been picked as the subject of research.

Presently, in the coal industry, chemical analysis is done using conventional analyzers for a confirmative screening and characterization of coal quality. The conventional techniques (proximate analysis and ultimate analysis) of coal characterization also require petrologists to separate the waste coals. These conventional techniques are tedious procedure and are the least representative. Henceforth, the conventional characterization is needed to be replaced by implementing the machine vision system. Petruk (1976) was first introduced the machine vision technology in the mining industry at the Canada Centre for Minerals and Energy Technology (CANMET) for quantitative mineralogical analysis. Subsequently, the image analyser was used in the mineral industry in South Africa (Oosthuyzen 1980). The first large-scale application of the machine-vision system in the mining industry was made by Oestreich et al. (1995) to measure the mineral concentration using a colour sensor system. Many other applications of machine vision systems like particle distribution analysis, froth flotation analysis, mineral classification, lithological composition, ore grindability, and mineral grade prediction (Sadr-Kazemi and Cilliers 1997; Al-Thyabat and Miles 2006; Chatterjee and Bhattacherjee 2011) are made in the mining and mineral industries. A machine vision system can enable us to accomplish quantitative measures of the characteristics of coal constituents.

Till date, countless researchers have suggested the coal characterization techniques, but very few studies have been done using image-processing techniques (Yuan et al. 2014; Ko and Shang 2011; Hamzeloo et al. 2014; Zhang et al. 2014; Alpana and Mohapatra 2016; Zhang 2016). As indicated by literature, many researchers are working on image-based automated and semi-automated ore characterization systems (Oestreich et al. 1995; Chatterjee et al. 2010; Chatterjee and Bhattacherjee 2011; Patel et al. 2016, 2017) across the world. Zhang et al. (2014) proposed a genetic algorithm based support vector machine (GA-SVM) algorithm for prediction of ash content in coarse coal by image analysis. The study suggested a semi-automatic local-segmentation technique to identify the coal particle region. The study results further indicated that the prediction performance of narrow size fractions was superior to wider size fractions. At the same time, the accuracy of prediction is superior for bigger size fraction in comparison to that of the smaller size fractions. Mao et al. (2012) discussed on porosity analysis (surface porosity and voxel porosity) based on computer tomography (CT) images of coal. Zhang et al. (2012) proposed an improved estimation method of coal particle mass using image analysis. The study proposed an image analysis technique using the enhanced mass model for the estimation of coarse coal particles. Kistner et al. (2013) proposed an image analysis technique for monitoring mineral processing systems. The study utilized the texture features of the image for monitoring the grade in froth flotation circuits in mineral processing systems. The study results confirmed that the performance of the grade control could be improved using multiscale wavelet feature of images. Mejiaa et al. (2013) proposed the automated maceral characterization using histograms analysis of the colour features of the images.

Wang et al. (2018) used SVM technique for separation of coal from gangue using colour and texture features. Hou (2019) worked with a similar objective as separation of coal and gangue using surface texture and grayscale feature of coal images with feed forward neural network model. Later, morphology-based supplementary feature and fused texture feature were introduced for separation of coal from gangue (Sun et al. 2019a). Sun et al. (2019b) subsequently used fused texture feature to separate coal by using simple linear iterative clustering (SLIC) and simple linear fused texture iterative clustering (SLFTIC). The coal rock interface was identified using fuzzy based neural network (Liu et al. 2020).

The proposed study aims to devise an automated image analysis system for the coal characterization with the assistance of image processing techniques, pattern recognition, and model development. The proposed study has been carried out in multiple stages like image acquisition, image segmentation, feature extraction, feature selection, and model development for characterisations. The inspiration behind this work is to overcome the quality inspection challenges faced by the mining industries by presenting a computer-based technique. The proposed strategy enhances the outcomes that can be acquired by investigating the texture and color features of coal samples. Such mechanized methods guarantee consistency in result, reliability, exactness, cost-effective, proficiency, and are less tedious. In addition, the proposed strategy will assist in providing consistent results.

All over the world, various research bunches are dealing with image-based computerized and semi-computerized characterization techniques. Image-based characterization of coal samples is generally done by analyzing the morphological, texture, and color features. Even though the previously mentioned systems are sufficient, they are not abundant to indicate relevant features extraction and features selection for coal characterization with more than 90% accuracy. Therefore, the proposed study attempts to develop a machine vision approach for coal characterization using digital images. The specific objective of the proposed research is to develop a machine vision system using artificial neural network (ANN) based algorithm for automated coal characterization. The study also demonstrates the comparative performance analysis of the proposed model and Gaussian process regression (GPR) model in coal characterization.

2 Materials and methodology

The proposed machine vision system will use the hardware units like bulbs for illumination, a camera for image acquisition, and computer for image processing. The software algorithms have been developed for the proposed system are automatic image acquisition, image pre-processing, feature extractions, feature optimisation, and machine learning in Matlab software. The detailed description of the proposed methodology is depicted in this section. The flowchart of the working methodology for the development of the automated characterization of coal is shown in Fig. 1. The steps are briefly mentioned in the subsections.

Fig. 1
figure 1

Flowchart of the working methodology

2.1 Sample collection and preparation

In the present study, the coal samples were collected from different mines to obtain the heterogenic nature of samples. The sample collection mines were chosen from different coalfields in India: Orient Colliery Mine No. 3 of Mahanadi Coal Field Limited (MCL), Orient Mine No. 1 and 2 of MCL, Dipka OCP of South Eastern Coal Field Limited (SECL), Basundhara Open Cast Project (OCP) of MCL, Raniganj Coalfields of Eastern Coal Field Limited (ECL), Chinakuri Colliery of ECL, Barachok Colliery of ECL. A total of twenty coal samples were collected. The number of samples collected from each mine is shown in Table 1. The coal samples collected were broken down to a convenient size in the mine to get representative samples and were then immediately moved into water/air proof compartments or gathered in polythene packs with the goal that they were not oxidized.

Table 1 Name of the sample collection mines and sample code

2.2 Image acquisition of coal samples

The first task of a machine vision system is the image acquisition of the objects. Image acquisition can be characterized as the act of capturing the image of an object or scene to recall the condition or identification of the object later on using an image analysis technique. A quality image acquisition is one of the important parts of the image analysis. In this work, the images of the coal samples were captured using a camera in a controlled environment (shown in Fig. 2). The image acquisition system consists of constant illumination and a camera for image capturing. For image capturing, a 15-megapixel camera (Make: Logitech HD Webcam C920) was installed. The image-capturing limit of that camera was 30 frames per second. The light emitting diode (LED) bulbs were introduced for encouraging steady brightening amid image capturing procedure. The bulbs were installed at a slant of 45° from the vertical wall of the test set-up with a specific end goal to diminish the reflectance. These captured images are being processed to extract image features.

Fig. 2
figure 2

Laboratory set-up for image acquisition of coal samples

Each captured digital image is shown utilizing three primary colors (red, green, and blue). In all classes of digital imaging, the data is changed over by image sensors into computerized signals that are prepared by a computer and influenced yield as a visible-light image. A total of 80 images were captured for different coal samples. The images of coal samples captured from four different angles are shown in Fig. 3.

Fig. 3
figure 3

Typical Image of a coal sample from four different angles

2.3 Image segmentation

The captured images were rectangular in shape. The images also contain the background and thus need to be removed before feature extraction. More decisively, image segmentation is the way toward assigning out a mark to each pixel in an image to an extent with a similar specific characteristic. Since the capture images have mostly black for coal and white for the background, a binary threshold segmentation technique was used for removing background (Sahoo et al. 1988). All the image samples of coal were accurately segmentated from the background as the backgrounds were not complex. The threshold operation was done by partitioning the pixels into two classes, objects and background at threshold gray level (Otsu 1979). An example of the segmented image of coal samples is shown in Fig. 4. After image segmentation, the information exists only in the pixels which cover the coal samples.

Fig. 4
figure 4

Images of coal samples after segmentation

2.4 Features extraction

Each image stores information about the objects in the pixel. Feature extraction was done to obtain information about the object. In this study, the color and texture-based features of images of coal samples were extracted for coal characterization. The color-based features were extracted in six unique color spaces (RGB, Gray, HSI, CMYK, Lab, xyz); whereas, the texture features were extracted from intensity image of the HSI color space in four diverse frequency domains (Cosine, Fourier, Wavelet, Gabor).

2.4.1 Color features extraction

The camera captured the images in the RGB color space. RGB color space has three color components [red (R), green (G), and blue (B)]. The RGB color model was converted into five other color models (HSI, CMYK, Gray, Lab, and xyz). The HSI color model has three color components viz. hue (H), saturation (S), and intensity (I). The hue part depicts the color itself as a point between 0° and 360° (0° indicate red, 120° indicate green, 240° indicate blue, 60° indicate yellow, and 300° indicate magenta). The saturation value indicates the amount of color mixed with the white color. The range of the S segment is between 0 and 1. The intensity also ranges from 0 to 1 (0 implies black, 1 implies white). The HSI color components images were derived from the RGB color components using the following equations (Yang et al. 2012).

$${I} = \frac{1}{3}({\text{R}} + {\text{G}} + {\text{B}})$$
(1)
$$H = \left\{ {\begin{array}{*{20}l} {{ \cos }^{ - 1} \left[ {\frac{{\left( {{\text{R}} - \frac{1}{2}{\text{G}} - \frac{1}{2}{\text{B}}} \right)}}{{\sqrt {{\text{R}}^{2} + {\text{G}}^{2} + {\text{B}}^{2} - {{{\text{R}}\times{\text{G}}}} - {{{\text{R}}\times{\text{B}}}} - {{{\text{G}}\times{\text{B}}}} } }}} \right]} \hfill & {\text{if}\;{\text{G}} \ge {\text{B}}, \;{\text{or}}} \hfill \\ {360 - { \cos }^{ - 1} \left[ {\frac{{\left( {{\text{R}} - \frac{1}{2}{\text{G}} - \frac{1}{2}{\text{B}}} \right)}}{{\sqrt {{\text{R}}^{2} + {\text{G}}^{2} + {\text{B}}^{2} - {{{\text{R}}\times{\text{G}}}} - {{{\text{R}}\times{\text{B}}}} - {{{\text{G}}\times{\text{B}}}} } }}} \right]} \hfill & {\text{if}\;{\text{B}} > {\text{G}}} \hfill \\ \end{array} } \right.$$
(2)
$$S = \left\{ {\begin{array}{*{20}l} {1 - \frac{{3\times\hbox{min} ({\text{R}},{\text{G}},{\text{B}})}}{{({\text{R}},{\text{G}},{\text{B}})}}} \hfill & {\text{if}\;I > 0, \;\text{or}} \hfill \\ 0 \hfill & {\text{if}\;I = 0} \hfill \\ \end{array} } \right.$$
(3)

The CMYK color space is subtractive in nature and consists of four color components [cyan (C), magenta (M), yellow (Y), and key or black (K)]. The color components of the CMYK color model were derived from RGB color components using the following equations (Agrawal et al. 2011).

$$C = \frac{{{\text{max(R}}, {\text{G}}, {\text{B)}} - {\text{R}}}}{{{\text{max(R}}, {\text{G}}, {\text{B}})}}$$
(4)
$$M = \frac{{{\text{max(R}}, {\text{G}}, {\text{B)}} - {\text{G}}}}{{{\text{max(R}}, {\text{G}}, {\text{B}})}}$$
(5)
$$Y = \frac{{{\text{max(R}}, {\text{G}}, {\text{B)}} - {\text{B}}}}{{{\text{max(R}}, {\text{G}}, {\text{B}})}}$$
(6)
$$K = 1 - \hbox{max} ({\text{R}},{\text{ G}},{\text{ B}})$$
(7)

The Lab color space portrays all discernible colors mathematically using the three measurements (L, a and b). The Lab color space has three components: L (lightness), a (green–red), and b (blue–yellow). The Lab color space incorporates all the colors recognize by a human being. The color components of Lab color space cannot be derived directly from the RGB color model but can be derived from xyz color space using the following equations (Häfner et al. 2012).

$$L = \left\{ {\begin{array}{*{20}l} {116\left( {\frac{y}{{y_{n} }}} \right)^{{\frac{1}{3}}} - 16} \hfill & {\text{if}\; \frac{y}{{y_{n} }} > 0.008856} \hfill \\ {903.3\left( {\frac{y}{{y_{n} }}} \right)} \hfill & {\text{Otherwise}} \hfill \\ \end{array} } \right.$$
(8)
$$a = 500\left( {f\left( {\frac{x}{{x_{n} }}} \right) - f\left( {\frac{y}{{y_{n} }}} \right)} \right)$$
(9)
$$b = 200\left( {f\left( {\frac{y}{{y_{n} }}} \right) - f\left( {\frac{z}{{z_{n} }}} \right)} \right)$$
(10)

The xyz color model consists of three color components x, y, and z. The color components of xyz color model can be extrapolated from the RGB color model. The color component y implies luminance, z is to some degree equivalent to blue, and x is a blend of cone reaction curves been orthogonal to luminance and non-negative. The transformation matrix can be done using the following way (Karungaru et al. 2004).

$$\left[ {\begin{array}{*{20}c} x \\ y \\ z \\ \end{array} } \right] = \left[ {\begin{array}{*{20}c} {0.412453} & {0.357580} & {0.180423} \\ {0.212671} & {0.715160} & {0.072169} \\ {0.019334} & {0.119193} & {0.950227} \\ \end{array} } \right] \left[ {\begin{array}{*{20}c} {f({\text{R}})} \\ {f({\text{G}})} \\ {f({\text{B}})} \\ \end{array} } \right]$$
(11)

The function f(C ϵ R, G, B) can be determined as

$$f(C) = \left\{ {\begin{array}{*{20}l} {\frac{C}{12.92},} \hfill & {\text{if}\;{{C}} \le 0.04045} \hfill \\ {\left( {\frac{C + 0.055}{1 + 0.055}} \right)^{2.4} ,} \hfill & {\text{otherwise}} \hfill \\ \end{array} } \right.$$

In the RGB color model, R color component has the highest wavelength of all the three colors, and G color component has the least wavelength. The green color also gives the highest relieving impact on the eyes. The Gray color image can be derived using the following equation (Gonzalez and Woods 2008).

$${\text{Grayscale}}\;{\text{image}} = 0.2989{\text{R}} + 0.5870{\text{G}} + 0.1140{\text{B}}$$
(12)

Images of 17 color components derived from RGB color space image are shown in Fig. 5.

Fig. 5
figure 5

Images of different color components of a typical coal sample a red-R b green-G c blue-B d hue-H e saturation-S f intensity-I g cyan-C h magenta-M i yellow-Y j key or black-K k lightness/luminance-L l Color Opponents Green–Red -a m Color Opponents Blue–Yellow-b n Spectral response values corresponding to the Red-x o Spectral response values corresponding to the Green-y p Spectral response values corresponding to the Blue-z q Gray

2.4.2 Texture feature extraction

The intensity (I) colour component of HSI color space was transformed into four frequency domain viz. discrete cosine transform (DCT), discrete Fourier transform (DFT), discrete wavelet transform (DWT), Gabor filter transform.

DCT represents an image as a summation of sinusoids of varying magnitude and frequencies. DCT has the property that the information of a regular image can be packed in the couple of coefficients of the DCT. The two-dimensional DCT of a matrix A (size: MxN) can be characterized as follows (Ahmed et al. 1974):

$$f(u,v) = \, \alpha_{p} \alpha_{q} \mathop \sum \limits_{m = 0}^{M - 1} \mathop \sum \limits_{n = 0}^{N - 1} A_{mn} \cos \frac{\pi (2m + 1)p}{2M}\cos \frac{\pi (2n + 1)q}{2N}$$
(13)

where

$$\alpha_{p} = \left\{ {\begin{array}{*{20}l} {\frac{1}{{\sqrt {M } }}} \hfill & {\text{if}\;p = 0} \hfill \\ {\sqrt { \frac{2}{M}} } \hfill & {\text{if}\;1 \le p \le M - 1} \hfill \\ \end{array} } \right.$$
(14)
$$\alpha_{q} = \left\{ {\begin{array}{*{20}l} {\frac{1}{{\sqrt {N } }}} \hfill & {\text{if}\;q = 0} \hfill \\ {\sqrt { \frac{2}{N}} } \hfill & {\text{if}\;1 \le q \le N - 1} \hfill \\ \end{array} } \right.$$
(15)

The functions f(u, v) are known as the DCT coefficients of A.

DFT is an important tool for image processing, which is used to decompose an image into its sine and cosine components. The input image (spatial domain) is transformed into the frequency domain. The frequency information of DFT can be useful for object recognition. DFT transform of the spatial domain image can be done using the following equation (Tang and Stewart 2000).

$$DFT(u,v) = \frac{1}{M\times{N}}\sum\limits_{x = 0}^{M - 1} {\sum\limits_{y = 0}^{N - 1} {f(x,y)} } e^{{ - j2\pi \left( {\frac{u}{M}x + \frac{v}{N}y} \right)}}$$
(16)

where f(x, y) represents the pixel value of an image, and the exponential term is the basis function corresponding to each point DFT (u, v) in the Fourier space.

The directional information can be captured along with frequency details and space using the DWT. DWT is used to decompose the image into different resolution sub-images for separating the high-frequency from the low-frequency components of the image (Murtagh and Starck 2008). The first level of decomposition of an image using low (L) and high (H) pass filter provides four sub-images. The four sub-images represent the approximate coefficient (dA), the detailed coefficient in the horizontal direction (dH), the vertical direction (dV), and diagonal direction (dD) respectively.

In image processing, a Gabor filter is a linear filter utilized for texture analysis. The Gabor filter is used for multi-resolution texture feature extraction. Gabor filters are known as directional bandpass filters due to orientation and frequency selective properties (Manjunath and Ma 1996). In the present study, only one resolution in four directions (0°, 45°, 90°, and 135°) was considered for features extraction. Images of 11 frequency transform coefficient derived from the intensity component image are shown in Fig. 6.

Fig. 6
figure 6

Typical images of different frequency transform coefficients derived from Intensity component image a DCT coefficient b DFT coefficient of real component c DFT coefficient of imaginary component d approximate coefficient of DWT e detailed coefficient in horizontal direction of DWT f detailed coefficient in vertical direction of DWT g detailed coefficient in diagonal direction of DWT h Gabor filter transform coefficient in 0° direction i Gabor filter transform coefficient in 45° direction j Gabor filter transform coefficient in 90° direction k Gabor filter transform coefficient in 135° direction

Thus, for each captured image of the coal sample, a different image was produced correspond to 17 colour component and 11 frequency transform coefficients. That is, the image features were separated by changing the images into various color spaces and frequency domains. In this present investigation, 10 statistical parameters (minimum, maximum, mean, skewness, kurtosis, variance, standard deviation, moments of third, fourth and fifth order) were extracted from each each of the 17 colour components and 11 frequency transform images for model development.

The statistical parameters of a typical image ‘I’ of M×N size corresponding to a specific color component or frequency domain can be determined as:

$$f_{\text{Minimum}} = \min_{\begin{subarray}{l} x [1, \ldots M] \\ y [1, \ldots ,N] \end{subarray} } p(x,y)$$
(17)
$$f_{\text{Maximum}} = \max_{\begin{subarray}{l} x \left[ {1, \ldots ,M} \right] \\ y \left[ {1, \ldots ,N} \right] \end{subarray} } p(x,y)$$
(18)
$$f_{\text{Mean}} = \frac{1}{M\times{N}}\mathop \sum \limits_{x = 1}^{M} \mathop \sum \limits_{y = 1}^{N} p(x,y)$$
(19)
$$f_{\text{Variance}} =\frac{1}{M\times{N}}\mathop \sum \limits_{x = 1}^{M} \mathop \sum \limits_{y = 1}^{N} \left| {p(x,y) - f_{\text{mean}} } \right|^{2}$$
(20)
$$f_{{{\text{Standard-Deviation}} }} = \sqrt {f_{\text{Variance}} }$$
(21)
$$f_{{{\text{Moment}} _ {\text{k}}}} = \frac{1}{M\times{N}}\mathop \sum \limits_{x = 1}^{M} \mathop \sum \limits_{y = 1}^{N} \left( { p(x,y) - f_{\text{mean}} } \right)^{k} \quad {\text{for}}\;k = 3,4,5$$
(22)
$$f_{\text{Skewness}} = \frac{{f_{{{\text{moment}} _ 3}} }}{{\left( {f_{{{\text{Standard-Deviation}}}} } \right)^{3} }}$$
(23)
$$f_{\text{Kurtosis}} = \frac{{f_{{{\text{moment}} _ 4}} }}{{\left( {f_{{{\text{Standard-Deviation}}}} } \right)^{4} }}$$
(24)

where p(x, y) represent the pixel value at x, y coordinate.

Ten statistical parameters were derived for each of the 17 colour component images and 11 frequency transform coefficients. Thus, the total number of features extracted from each image was 280 (= 10×17 + 10×11). In colour feature extraction, 17 colour components and 10 statistical parameters are used, and hence the total number of colour features considered was 170. On the other hand, in texture feature extraction, 11 texture features and 10 statistical parameters are used, and thus the total number of texture features considered was 110. A total of 80 images were captured for different coal samples. From 80 images, a total of 22,400 (= 80×280) image features were derived corresponding to 17 colour components and 11 frequency transform coefficients. The list of features and their unique ID is summarized in Table 2.

Table 2 List of color and texture components image and their features

2.5 Laboratory analysis

2.5.1 Preparation of coal samples

Coal samples collected from mines are crushed and screened through 72 mesh (211 microns) sieve. The screened samples are stored in the sealed airtight glass bottles with their unique sample ID. These coal samples are used in the proximate analysis in the laboratory for estimating the compositions.

2.5.2 Proximate analyses of coal

The proximate analysis of coal was done to measure the moisture content (MC), volatile matter (VM), ash (Ash) content, and fixed carbon (FC) content in coals. The methods of determination of these four components are explained below:

  1. (1)

    Determination of moisture content (MC)

Moisture represents the water exists in the coal samples. The MC in a coal sample can be determined by observing the weight losses of collected coal samples due to the release of the contained water within the chemical structure of the coal in a controlled condition. If the initial weight of the coal sample is Wi and the weight after removing water content is Wf, then the moisture content in the coal can be determined as:

$${\text{Moisture}}\;{\text{content}}\;\% \; ( {\text{MC)}} = \left( {\frac{\text{Weight loss}}{\text{Weight sample}}} \right)\times 100 = \left( {\frac{{W_{\text{i}} - W_{\text{f}} }}{{W_{\text{i}} }}} \right)\times100$$
(25)
  1. (2)

    Determination of volatile matter (VM)

VM present in coal can be liberated at high temperature in the absence of oxygen. The amount of VM in a coal sample can be determined by measuring the weight loss of coal sample due to heating under controlled conditions to drive off the contained water, vapor, and gases exist within the coal sample minus moisture content. The actual VM is acquired by subtracting the MC of the sample using the following equation.

$${\text{Volatile}}\;{\text{matter}}\;\% \;({\text{VM}}) = \left( {\frac{\text{Weight loss due to heating}}{\text{Weight of sample}}} \right)\times \, 100{-}{\text{MC}}$$
(26)
  1. (3)

    Determination of ash content

The residue left after burning of coal is referred to as ash. The residue left after burning mainly contains the inorganic substances. The ash content percentage in the coal sample can be determined as:

$${\text{Ash}}\;\% \;({\text{Ash}}) = \left( {\frac{\text{Weight of residue}}{\text{Weight of sample}}} \right) \times100$$
(27)
  1. (4)

    Determination fixed carbon (FC)

FC in coal is referred to as carbon content, which is not combined with any other components. The percentage of FC can be determined by subtracting the percentages of MC, VM, and Ash from the percentage of original weight (100%) of the coal sample. It can be represented as:

$${\text{Fixed}}\;{\text{carbon}}\;\% \;({\text{FC}}) = 100 - ({\text{MC}}\% + {\text{Ash}}\% + {\text{VM}}\% )$$
(28)

2.6 Feature selection

The processing time of the model increases with the increase in the number of features. The higher processing time increases the computational cost. Furthermore, the performance and complexity of a model are highly dependent on the feature dimensions (Liu et al. 2005). It may be possible that the extracted feature set includes irrelevant and redundant features. The performance of the model may reduce due to consideration of the irrelevant and redundant features (Bratu et al. 2008). Thus, the relevant features need to identify using a suitable feature selection method before the model run. Many feature selection/reduction techniques like principal component analysis (PCA), genetic algorithm (GA), sequential forward floating selection (SFFS), etc. have been developed and used in various studies (Marcano-Cedeno et al. 2010; Murata et al. 2015; Pudil et al. 1994). The present study used a stepwise selection method for the selection of the relevant features (Heinze et al. 2018). In the stepwise selection method, all the extracted features considered as independent variables and the individual coal characteristics value as the dependent variable. At each step, an independent variable is added or subtracted based on the pre-specified criteria. This is done using the F-test criteria. The process requires to define two significance levels, one for adding variable and other for removing variables. Thus, before the model development, an optimized feature subset was identified. The optimized feature subsets for each coal characterisation parameter are summarised in Sect. 3.

2.7 Development of artificial neural network (ANN) model for prediction of coal characteristics

The non-linear nature of the relationship between input and output can be mapped using various types of regression models. The model development was done using the optimized feature subset as input parameters, and the corresponding coal characteristic is the output parameter. The optimized feature subset may be different for different coal characteristics (MC, VM, Ash, and FC) and thus, four different models were developed for prediction of four characteristics parameters. In the present study, a machine vision system based on the ANN model was developed for automated coal characterisation.

In the first step of ANN model development, all the model parameters (synaptic weight, input features, and outputs) need to be initialized. The values of all the input features along with the output parameters were normalized in the range of 0–1 before using in the model. The normalization process increases the training speed and reduces the noise in the data. The normalization of the data was done using the following Eq. (1):

$${\text{Normalized}}\;{\text{value}}\;{\text{of}}\;{i}\,\text{th} \;{\text{sample}}\;{\text{of}}\;{j}\,\text{th} \;{\text{parameter}} = \frac{{{\text{Observed}}\;{\text{value}}\;{\text{of}}\;{i}\,\text{th} \;{\text{sample}}\;{\text{of}}\;{j}\,\text{th} \;{\text{parameter}} - {\text{minimum}}\;{\text{value}}\;{\text{of}}\;{j}\,\text{th} \;{\text{parameter}}}}{{{\text{Maximum}}\;{\text{value}}\;{\text{of}}\;{j}\,\text{th} \;{\text{parameter}} - {\text{minimum}} \;{\text{value}}\;{\text{of}}\;{j}\,\text{th} \;{\text{parameter}}}}$$
(29)

The next step of the model development is the selection of network architecture. The present study used a feed-forward artificial neural network (FF-ANN) model for mapping the image features to quantify the object characteristics. The architecture of the model network is shown in Fig. 7. In Fig. 7, the number of the input parameter and the number of node in the hidden layer are respectively M and N. The number of the input parameter of the model is the number of selected features. For each of the four coal characteristics (MC, VM, FC, and Ash), the number of optimized features derived separately, and thus, the numbers may be different. Therefore, a different model was developed for each output of coal characteristics. The detailed description of the selection of the optimized features is given in Section 2.6. The number of output parameter considered for each model is one. The model development was done using Neural Network Toolbox of MATLAB R2015b software.

Fig. 7
figure 7

Architecture of the feed-forward artificial neural networks (FF-ANN)

In the model, the data is processed through nodes or neuron from one layer to another layer. The data is processed from the input layer to the output layer via the hidden layers. All the model parameters were then initialized with random initialization of synaptic weights. A synaptic weight was randomly assigned to each connection to define the relationship strength between nodes. The hidden layer output of the jth node, yj is given as

$$y_{j} = f\left( {\mathop \sum \limits_{i = 1}^{n} {{W}}_{ij} {{X}}_{i} + {{b}}_{j} } \right)$$
(30)

where Xi is the input received at node j, Wij is the connection weight of the pathway between the ith input node and jth hidden node, n is the aggregate number inputs to node j, and bj is the bias term in the hidden layer. f represents the activation function that gives the response of a node of the aggregated input signal. The present study used a sigmoid transfer activation function. It is given by

$$f(x) = \frac{1}{{1 - {\text{e}}^{ - x} }}$$
(31)

A sigmoid activation function is continuous and differentiable in nature. It can map the nonlinear relationship.

The next step is the determination of the output layer. The predicted output, of the kth node, Pk can be determined using the following equation.

$$P_{k} = \mathop \sum \limits_{j = 1}^{l} {{W}}_{jk} {{y}}_{j} + {{b}}_{k}$$
(32)

where yj is the response of hidden node, j, Wjk represents the weight of the pathway links between the j th hidden node and a th output node, l is the aggregate number inputs to node k, and bk is the bias term in the output layer.

The next step is to determine the error. In the proposed algorithm, the pattern of each input of the training dataset was passed from the input layer to the output layer via the hidden layer. The system predicted outputs for every input pattern of the dataset and compared with the targets to determine the error level. It can be determined from the predicted values and target values using the following equation.

$$E = \mathop \sum \limits_{m} \left( {P_{k} - O_{k} } \right)^{2}$$
(33)

where Pk is the predicted output, and Ok is the observed/target output. m is the number of output or training patterns.

In back propagation feed forward neural network, the path weights (Wij and Wjk) are updated iteratively based on the error value. These are updated until the error level reached to the desired value.

In the current study, the model was developed with one hidden layer. The model was also tested for a distinctive number of hidden neurons for optimizing the performance of the models. Four different models were developed for prediction of four coal characteristic parameters. The models were evaluated using the selected feature subset as input and the corresponding coal characteristic parameter as the output. Data partitioning for training and testing of the model is one of the most important tasks of the model development. It is always desired that both the datasets (training and testing) should have a similar type of distribution. In the current study, a k-holdout method was adopted for the random partitioning of the data into training and testing in the ratio 75:25. That is, the 80 datasets divided into 60 and 20 respectively for training and testing. The distributions of both the datasets were examined using a paired t test. The results confirmed that both the datasets follow a similar distribution at a 5% significance level for each feature. The network used a Levenberg–Marquardt (LM) based back propagation learning algorithm to adjust the weights. A logistic sigmoid nonlinear function (logsig) was used to connect the input layer to hidden layer; whereas, a linear transfer function (purelin) was used to connect the hidden layer to the output layer.

2.8 Model evaluations

The models created utilizing a neural network regression algorithm require cross-validation before implementation. Previously, numerous model execution parameters were recommended and utilized for the assessment of the regression model.

The assessment of the regression model was directed utilizing mentioned indices parameters. These are mean squared errors (MSE), root mean squared error (RMSE), normalized mean squared error (NMSE), R-squared (R2), and bias. All the indices were determined from the observed and predicted values of testing samples using the following equations.

$${\text{RMSE}} = \sqrt { \frac{1}{n}\mathop \sum \limits_{i = 1}^{n} \left( {p_{i} - o_{i} } \right)^{2} }$$
(34)
$${\text{MSE}} = \frac{1}{n}\mathop \sum \limits_{i = 1}^{n} \left( {p_{i} - o_{i} } \right)^{2}$$
(35)
$${\text{NMSE}} = \frac{1}{n}\mathop \sum \limits_{i = 1}^{n} \frac{{\left( {p_{i} - o_{i} } \right)^{2} }}{{\bar{p}_{i} \bar{o}_{i} }}$$
(36)
$$R^{2} = \frac{{\left( {\mathop \sum \nolimits_{i = 1}^{n} \left( {p_{i} - \bar{p}_{i} } \right)\left( {o_{i} - \bar{o}_{i} } \right)} \right)^{2} }}{{\mathop \sum \nolimits_{i = 1}^{n} \left( {p_{i} - \bar{p}_{i} } \right)^{2} \mathop \sum \nolimits_{i = 1}^{n} \left( {o_{i} - \bar{o}_{i} } \right)^{2} }}$$
(37)
$${\text{BIAS}} = \frac{1}{n}\mathop \sum \limits_{i = 1}^{n} \left( {p_{i} - o_{i} } \right)$$
(38)

where pi and oi represents the predicted and observed values of the ith sample, respectively. \(\bar{p}_{1}\) and \(\bar{o}_{1}\) respectively represents the mean of predicted and observed values of all the samples. These values can be determined as

$$\bar{p}_{i} = \frac{1}{n}\mathop \sum \limits_{i = 1}^{n} p_{i} \quad {\text{and}}\quad \bar{o}_{i} = \frac{1}{n}\mathop \sum \limits_{i = 1}^{n} o_{i}$$

RMSE is a measure of the spread of the residuals. It tells about the deviations of the observed data from the best fit line. The mean squared errors of prediction (MSE) is the measure of the average of the squares of the errors between the observed data and the predicated data. The NMSE is an estimator of the general deviations amongst anticipated and estimated values. The R2 is the measure of change of the prediction from observed. The higher the measure represents, the better is the prediction model. For a perfect model, R2 esteem ought to be 1. In the model assessment, the bias value represents the normal deviation of the predicted value from the observed. The bias of a model can be positive or negative.

3 Results and discussion

The images of the coal samples were captured in a controlled environment for further analysis. The total number of coal samples used in the study was 20. Four images were captured for each coal sample from four different angles, and thus, the total number of images captured for the model development was 80. Coal samples corresponding to each image were analyzed in the laboratory for characterizations. The estimated coal characteristics values were used for model calibrations. The samples were analysed using the proximate method. The proximate analyses of 20 coal samples were conducted to determine the MC, VM, Ash, and FC. The experimental results, which were obtained in the laboratory from the proximate analysis, are summarized in Table 3. The results indicate that the mean values of MC, VM, Ash, and FC are respectively 5.19%, 29.81%, 26.49%, and 38.50%. The range of the MC, VM, Ash, and FC percentage are respectively 2.02–8.74, 10.10–41.51, 15.08–46.13, and 24.36–66.07. The result further indicates that the samples collected have a wide variation in coal characteristics, which supports the need for a suitable quality monitoring system in the mine.

Table 3 Experimental values of moisture content, volatile matter, ash content and fixed carbon content

To develop the model, 280 image features were extracted from each image. These feature include 10 statistical features for each component of color space (R, G, B, H, S, I, x, y, z, C, M, Y, K, L, a, b, and Gray) and frequency transform coefficients (coefficient of DCT, sine component of DFT, cosine component of DFT, approximate coefficient of DWT, detailed coefficient of DWT in the horizontal direction, detailed coefficient of DWT in the vertical direction, detailed coefficient of DWT in the diagonal direction, and Gabor filter transform coefficient in 0° direction, Gabor filter transform coefficient in 45° direction, Gabor filter transform coefficient in 90° direction, Gabor filter transform coefficient in 135° direction). The present study used 80 images of coal samples for features extraction. A total of 17 color component images were derived from each of the originally captured images. The images corresponding to the intensity component of HSI color space was used for texture feature extractions using four frequency domains (DCT, DFT, DWT, and Gabor filter).

To identify the relevant features for estimating the coal characteristics, a step-wise linear regression algorithm was used. It was observed that the number of optimized features derived using the step-wise linear regression algorithms was 12, 12, 18, and 18, respectively, for MC, Ash, VM, and FC. It was observed that the number of features was not fully depended on the variability or standard deviation (SD) of the data (shown in Table 3). The SD values for Ash, VM, FC, and MC are 8.6, 6.44, 9.33, and 1.66, respectively. It was observed that the order of the number of selected features (FC > Ash > VM > MC) is partially consistent with the order of SD value (FC = Ash > VM = MC). This may be due to the non-linear nature of the relationship between the input and output. Thus, for the development of four characteristics parameters, four different optimized feature subsets were derived. The image features in the order of most relevant for four prediction models are summarized in Table 4.

Table 4 Optimized feature subset obtained by step-wise linear regression algorithm for four parameters

The ANN models were developed using the optimized feature subset as input and the corresponding coal characteristics as the output. A different model was run for each parameter. Thus, 4-ANN models were developed for prediction of ash content, moisture content, fixed carbon, and volatile matter. The number of neurons in the hidden layers in each network system was also optimised for each model to obtain the best output. The optimized numbers of nodes in the hidden layers were derived as 6, 14, 37, and 34, respectively, for FC, Ash, VM, and MC prediction model. The features extracted belong to different ranges and therefore normalized in the range of 0–1 for the fast convergence and better performance of the model.

Each model used 80 datasets for training and testing of the model. The number of datasets used for training and testing is 60 and 20, respectively. The predicted values of testing samples for four different parameters are summarized in Table 5. The predicted values indicate that the mean values of MC, VM, Ash, and FC are respectively 5.53%, 30.21%, 21.10%, and 38.83% as compared to the observed values of 5.19%, 29.81%, 26.49%, and 38.50%. The range of the predicted values of the testing samples for MC, VM, Ash, and FC percentage are respectively 2.95–10.27, 2.79–45.81, 11.79–36.79, and 27.30–60.89. To determine the relationship between the observed values and predicted values of the testing samples, scatter plots for each coal characteristics parameters were drawn. These are represented in Fig. 8. The regression equation and R2 values were determined from scatter plots, as shown in Fig. 8. It can be easily inferred from Fig. 8 that the predicted values are closely matched to the observed values.

Table 5 Predicted results of testing samples of various parameters of ANN models
Fig. 8
figure 8

Observed versus predicted value of testing dataset of ANN Model a FC % b Ash % c VM % d MC %

The performance measure of each ANN model was analysed using four indices: RMSE, MSE, bias, NMSE, and R2. All the indices were determined from the predicted and observed values of the testing samples using Eqs. (4)–(8). The results shown in Table 6 indicate that the NMSE value is close to zero in each case. At the same time, the R2 values were found to be 0.89, 0.92, 0.92, and 0.84 respectively, for fixed carbon percentage, ash content percentage, volatile matter content percentage, moisture content percentage. The R2 values of a perfect prediction model should be equal to 1. In the present case, the R2 value indicates that the model predicted values are highly correlated with the observed values for FC and VM, but the correlation values are found to satisfactory for Ash and MC prediction models. The bias values of the models indicate that the models perform with little under prediction. The higher MSE and RMSE values of the models indicate the higher variance of the data and not the poor prediction.

Table 6 Performance measure indices of ANN models for prediction of four parameters

4 Comparative performance analysis of ANN model and GPR model

The performance of the proposed neural network model was also compared with the performances of Gaussian process regression (GPR), support vector regression (SVR), and radial basis neural network (RBNN) models. GPR models are nonparametric kernel-based probabilistic models. In the past, the GPR modelling approach has been used for many engineering solutions (Archambeau et al. 2007; Atia et al. 2012; Chen et al. 2014). The detailed modeling approach can be found in Williams and Rasmussen (1996). The same optimized features (which were derived corresponding to four parameters) were used as input in each model. The value of the optimized features along with the estimated coal characteristics, was normalized in the range of 0–1. The Kullback–Leibler optimal approximation (KL) inference method was used in model development. The goal of the SVR was to identify a function for which all the training patterns or dataset can have a maximum deviation, ε from the target values and at the same time the flatness should be as high as possible (Patel et al. 2019). A RBNN is special kind of artificial neural networks allowing training the model fast. In RBNN, each neuron receipts weighted sum of its input values. Then the activation of each neuron is applied which depends on the euclidean distance between a pattern and the neuron center (Valls et al. 2005).

To check the model performance, the same numbers of training and testing samples (as used in the ANN models) were used. That is, 60 samples were used for training, and the rest 20 samples were used for testing of the models. The comparative model performance results are shown in Table 7. The results indicate that the corresponding R2 values are higher, and the RMSE values are lower for the ANN model in each case. This indicates the ANN-based models predicted the values of four characteristics parameters more closely to the experimental values than the GPR, SVR and RBNN based models. Thus, it can be inferred from the results that the ANN model performs better than the GPR, SVR, and RBNN model in most of the cases.

Table 7 Comparative values of model performance indices for each parameter

5 Conclusions

The following conclusions were derived from the study results:

  1. (1)

    A different set of optimized features were derived for four ANN models used for ash, VM, FC, Moisture content prediction. It was observed that the optimised feature subset consists of both the color and texture-based features.

  2. (2)

    The proposed model will help in automated coal characterization with the precision of more than 80%.

  3. (3)

    The comparative study results indicated that the artificial neural network (ANN) model performs better than the Gaussian process regression (GPR) model in coal characterisation.

  4. (4)

    It can be inferred from the results that the model requires a different set of optimised image features set for prediction of ash, VM, FC, and MC predictions.

  5. (5)

    The feature selection algorithm is linear in nature, and thus, a non-linear feature selection can improve the performance of the model.