Robust extreme learning machine for modeling with unknown noise
Introduction
In the past decade, extreme learning machine (ELM), as a type of generalized single hidden layer feedforward networks (SLFNs), has been intensively studied both in theory and applications [1]. Unlike the traditional gradient based training approaches for SLFNs, which are easy to trap in the local minimum and time-consuming, the hidden layer parameters of ELM are assigned randomly without iterative tuning, and then it only needs to solve the least-square problem [2], [3]. Accordingly, ELM has much faster learning speed and is easier to implement than state-of-the-art machine learning approaches. Theoretically, Huang et al. [4] have proven the universal approximation of ELM. In addition, ELM has been extended to online learning [5], [6], structure optimization [7], [8], ensemble learning [9], [10], imbalance learning [11], [12], representation learning [13], [14], [15], as well as residual learning [16], [17], etc. For real world applications, ELM has been implemented to landmark recognition [18], [19], industrial production [20], [21], and wireless localization [22], [23], etc.
As mentioned above, ELM is becoming an increasingly significant research topic in the machine learning field, but majority of the existed ELMs assume that the data utilized for modeling are pure without noise and outliers, or with Gaussian error distribution. However, data uncertainty is inevitable in practical scenarios due to sampling errors, measurement errors, and modeling errors, etc., which may lead to noise subject to unknown distributions. It means that noise of the real world applications should be more complex, which may follow Gaussian distribution, Laplace distribution, or mixed distributions. In addition, the performance of the data-driven predictor will degrade seriously if the data are chaotic or too noisy. Therefore, ELMs without considering the effects of uncertainties may be not sufficient. There are usually two ways for strengthening the modeling capability of ELM in uncertain scenarios, including outlier detection and removing, and modifying objective function. For example, FIR-ELM was proposed to reduce the input disturbance by removing some undesired signal components through the FIR filtering [24]. He et al. [25] designed a hierarchical ELM to deal with high-dimensional noisy data, in which some groups of subnets were proposed for simultaneously reducing data dimension and filtering noise. However, the aforementioned outlier detection based ELMs may identify pure data as outliers, which easily break the original data structure and cause information loss. Another set of solutions is to enhance the robustness of the data-driven predictor by modifying the objective function of ELM. Specifically, second order cone programming, widely utilized in robust convex optimization problems, was introduced into ELM, but the computational burden of the novel ELM was relatively heavy [26]. Lu et al. [27] rewrote the objective function of ELM and proposed a probabilistic regularized ELM (PR-ELM) by incorporating the distribution information of modeling error into the modeling process, in which both the modeling error mean and the modeling variance were included in the modified objective function. The experimental results indicated that the proposed PR-ELM had a well-fitting performance and was more robust to noise. Although, ELMs with modified objective functions can achieve satisfactory performance in several tasks, the squared loss is still utilized in most of them, which may not guarantee that those ELMs can achieve the optimal solutions if the noise follows non-Gaussian distribution.
In order to tackle this issue, in this paper, a robust ELM (R-ELM) is proposed for improving the modeling capability and robustness of ELM in dealing with tasks with Gaussian and non-Gaussian noise. Different from the existed ELMs, which minimize the output weights and modeling errors with the assumption that noise follows Gaussian distribution, a new objective function of R-ELM is constructed, in which the characteristic of noise is described using mixture of Gaussian (MoG) for approximating the feature mapping between the inputs and the outputs. In addition, expectation maximization (EM) algorithm is implemented for estimating the parameters in the proposed R-ELM. The main contributions can be summarized as the following aspects:
(1) A robust objective function is developed based on MoG for enhancing the modeling capability with complex and unknown noise. Specifically, the squared loss of the modeling errors in the original objective function of ELM is replaced by MoG. Thus, the modified objective function enables R-ELM to be more robust due to the excellent capability of MoG for approximating any continuous noise distribution.
(2) Considering the analytical solutions of the parameters in the modified objective function of R-ELM cannot be calculated directly, EM algorithm is implemented to help obtain the optimal parameters.
(3) Comprehensive experiments have been conducted, the corresponding experimental results indicate that R-ELM outperforms state-of-the-art machine learning approaches on both selected benchmark datasets and real world applications.
The paper is organized as follows: Section 2 presents the ELM theory. The details of the proposed R-ELM are shown in Section 3, including the limitations of modeling with Gaussian noise, motivation of improving the modeling capability of ELM with unknown noise, modified objective function of R-ELM, and the corresponding solving process. Experimental results and further analysis on selected benchmark datasets are reported in Section 4, followed by the performance verification on two real world applications in Section 5. Finally, discussions, and conclusions and future works are respectively given in Sections 6 and 7.
Section snippets
ELM theory
In this section, a brief introduction of ELM theory is first given to facilitate the understanding of the following sections:
ELM was proposed for training the SLFNs with a three-layer structure, including: input layer, hidden layer, and output layer (see Fig. 1). Different from state-of-the-art machine learning approaches, its hidden layer parameters are generated randomly without iterative tuning, reducing the learning problem to that of estimating the optimal output weights β for a given
Robust modeling with unknown noise
In this section, the details of the proposed R-ELM will be given, including limitations of modeling with Gaussian noise, motivation of improving the modeling capability of ELM with unknown noise, objective function of R-ELM, and the corresponding solving process.
Performance verification on benchmark datasets
In this section, some selected benchmark datasets are employed to verify the effectiveness of the proposed R-ELM by comparing with a number of state-of-the-art machine learning approaches, including ELM [2], residual compensation ELM (RC-ELM) [16], PR-ELM [27], support vector machine (SVM), and back-propagation neural network (BPNN). In addition, all the experiments are conducted using Matlab 2015b running on a i5 3.2 GHz CPU with 4 GB RAM. In the experiments, the following root mean square
Performance verification on real world applications
In the above section, we have conducted experiments on selected benchmark datasets to demonstrate the performance of the proposed R-ELM. Then, we further evaluate the validity of R-ELM using two real world applications, including gas utilization ratio (GUR) prediction and hot metal silicon content (HMSC) prediction in blast furnace ironmaking process.
Blast furnace is one of the dominant unit for producing molten iron in the manufacture of iron and steel with large uncertainties. It usually has
Influence factors of modeling capability of ELM
In essence, three factors mainly confine the modeling capability of ELM, including:
(1) Sensitive objective function. The objective function of ELM is sensitive to the non-Gaussian noise, which widely exists in the real world applications. In order to tackle this issue, robust variants are proposed, such as R-ELM and PR-ELM, in which the original objective function of ELM is modified to approximate the complex and unknown noise distribution.
(2) Limited representation capability of the single
Conclusions
Most of the existed ELMs can theoretically obtain the optimal solutions under the assumption that the noise follows Gaussian distribution. However, in practice, noise of the real world applications is usually subject to unknown distributions, i.e., Gaussian, non-Gaussian, or even mixed distributions, which easily leads to the suboptimal solutions of these ELMs. In this paper, R-ELM is proposed to strengthen the modeling capability of classic ELM with unknown noise. Specifically, a modified
Declaration of Competing Interest
The Authors declare that we have no conflict of interest.
Acknowledgment
This work is supported in part by China Postdoctoral Science Foundation under Grants 2019TQ0002 and 2019M660328, National Natural Science Foundation of China under Grant 61673055 and National Key Research and Development Program of China under Grant 2017YFB1401203.
References (42)
- et al.
Trends in extreme learning machines: a review
Neural Netw.
(2015) - et al.
Extreme learning machine: theory and applications
Neurocomputing
(2006) - et al.
Bayesian network based extreme learning machine for subjectivity detection
J. Frankl. Inst.
(2018) - et al.
Constructive hidden nodes selection of extreme learning machine for regression
Neurocomputing
(2010) - et al.
TROP-ELM: a double-regularized ELM using LARS and Tikhonov regularization
Neurocomputing
(2011) - et al.
Ensemble of online sequential extreme learning machine
Neurocomputing
(2009) - et al.
Ensemble of subset online sequential extreme learning machine for class imbalance and concept drift
Neurocomputing
(2015) - et al.
Class-specific cost regulation extreme learning machine for imbalanced classification
Neurocomputing
(2017) - et al.
Towards enhancing stacked extreme learning machine with sparse autoencoder by correntropy
J. Frankl. Inst.
(2018) - et al.
Multilayer probability extreme learning machine for device-free localization
Neurocomputing
(2020)
Residual compensation extreme learning machine for regression
Neurocomputing
Landmark recognition with sparse representation classification and extreme learning machine
J. Frankl. Inst.
A new robust training algorithm for a class of single-hidden layer feed forward neural networks
Neurocomputing
A hierarchical structure of extreme learning machine (HELM) for high-dimensional datasets with noise
Neurocomputing
Noise model based v-support vector regression with its application to short-term wind speed forecasting
Neural Netw.
Incipient winding faults detection and diagnosis for squirrel-cage induction motors equipped on CRH trains
ISA Trans.
Universal approximation using incremental constructive feedforward networks with random hidden nodes
IEEE Trans. Neural Netw.
A fast and accurate online sequential learning algorithm for feedforward networks
IEEE Trans. Neural Netw.
A novel online sequential extreme learning machine for gas utilization ratio prediction in blast furnaces
Sensors
Parallel one-class extreme learning machine for imbalance learning based on Bayesian approach
J. Amb. Intel. Hum. Comput.
Kernel-based multilayer extreme learning machines for representation learning
IEEE Trans. Neural Netw. Learn. Syst.
Cited by (25)
Mixture extreme learning machine algorithm for robust regression
2023, Knowledge-Based SystemsModified online sequential extreme learning machine algorithm using model predictive control approach
2023, Intelligent Systems with ApplicationsEvolutionary feature selection on high dimensional data using a search space reduction approach
2023, Engineering Applications of Artificial IntelligenceVariational quantum extreme learning machine
2022, NeurocomputingCitation Excerpt :The work [26] proposed a modified online sequential learning algorithm that weights the new observations more and achieves better accuracy and more robustness. For unknown noise in the environment, a robust ELM is proposed to improve the robustness and generalization ability of the model under the disturbance of Gaussian and non-Gaussian noise [6]; (4) Extend ELM to deeper structures. For example, using ELM as a part of deep neural network to accelerate the training speed of the original network [27].
A GPU-based accelerated ELM and deep-ELM training algorithms for traditional and deep neural networks classifiers
2022, Intelligent Systems with ApplicationsCitation Excerpt :Zhang et al. (2018) proposed the residual compensation ELM for regression problem by applying a multilayer structure with the baseline layer for the feature mapping between the input and the output, next to the other layers for residual compensation layer by layer in an iterative manner. Zhang et al. (2020) presented a robust ELM for improving the modeling capability and robustness with Gaussian and non-Gaussian noise. Xiao et al. (2017) proposed the class-specific cost regulation extreme learning machine together with its kernel based extension, for binary and multiclass classification problems with imbalanced data distributions.