Robust extreme learning machine for modeling with unknown noise

doi:10.1016/j.jfranklin.2020.06.027

Journal of the Franklin Institute

Volume 357, Issue 14, September 2020, Pages 9885-9908

https://doi.org/10.1016/j.jfranklin.2020.06.027 Get rights and content

Abstract

Extreme learning machine (ELM) is an emerging machine learning technique for training single hidden layer feedforward networks (SLFNs). During the training phase, ELM model can be created by simultaneously minimizing the modeling errors and norm of the output weights. Usually, squared loss is widely utilized in the objective function of ELMs, which is theoretically optimal for the Gaussian error distribution. However, in practice, data collected from uncertain and heterogeneous environments trivially result in unknown noise, which may be very complex and cannot be described well using any single distribution. In order to tackle this issue, in this paper, a robust ELM (R-ELM) is proposed for improving the modeling capability and robustness with Gaussian and non-Gaussian noise. In R-ELM, a modified objective function is constructed to fit the noise using mixture of Gaussian (MoG) to approximate any continuous distribution. In addition, the corresponding solution for the new objective function is developed based on expectation maximization (EM) algorithm. Comprehensive experiments, both on selected benchmark datasets and real world applications, demonstrate that the proposed R-ELM has better robustness and generalization performance than state-of-the-art machine learning approaches.

Introduction

In the past decade, extreme learning machine (ELM), as a type of generalized single hidden layer feedforward networks (SLFNs), has been intensively studied both in theory and applications [1]. Unlike the traditional gradient based training approaches for SLFNs, which are easy to trap in the local minimum and time-consuming, the hidden layer parameters of ELM are assigned randomly without iterative tuning, and then it only needs to solve the least-square problem [2], [3]. Accordingly, ELM has much faster learning speed and is easier to implement than state-of-the-art machine learning approaches. Theoretically, Huang et al. [4] have proven the universal approximation of ELM. In addition, ELM has been extended to online learning [5], [6], structure optimization [7], [8], ensemble learning [9], [10], imbalance learning [11], [12], representation learning [13], [14], [15], as well as residual learning [16], [17], etc. For real world applications, ELM has been implemented to landmark recognition [18], [19], industrial production [20], [21], and wireless localization [22], [23], etc.

As mentioned above, ELM is becoming an increasingly significant research topic in the machine learning field, but majority of the existed ELMs assume that the data utilized for modeling are pure without noise and outliers, or with Gaussian error distribution. However, data uncertainty is inevitable in practical scenarios due to sampling errors, measurement errors, and modeling errors, etc., which may lead to noise subject to unknown distributions. It means that noise of the real world applications should be more complex, which may follow Gaussian distribution, Laplace distribution, or mixed distributions. In addition, the performance of the data-driven predictor will degrade seriously if the data are chaotic or too noisy. Therefore, ELMs without considering the effects of uncertainties may be not sufficient. There are usually two ways for strengthening the modeling capability of ELM in uncertain scenarios, including outlier detection and removing, and modifying objective function. For example, FIR-ELM was proposed to reduce the input disturbance by removing some undesired signal components through the FIR filtering [24]. He et al. [25] designed a hierarchical ELM to deal with high-dimensional noisy data, in which some groups of subnets were proposed for simultaneously reducing data dimension and filtering noise. However, the aforementioned outlier detection based ELMs may identify pure data as outliers, which easily break the original data structure and cause information loss. Another set of solutions is to enhance the robustness of the data-driven predictor by modifying the objective function of ELM. Specifically, second order cone programming, widely utilized in robust convex optimization problems, was introduced into ELM, but the computational burden of the novel ELM was relatively heavy [26]. Lu et al. [27] rewrote the objective function of ELM and proposed a probabilistic regularized ELM (PR-ELM) by incorporating the distribution information of modeling error into the modeling process, in which both the modeling error mean and the modeling variance were included in the modified objective function. The experimental results indicated that the proposed PR-ELM had a well-fitting performance and was more robust to noise. Although, ELMs with modified objective functions can achieve satisfactory performance in several tasks, the squared loss is still utilized in most of them, which may not guarantee that those ELMs can achieve the optimal solutions if the noise follows non-Gaussian distribution.

In order to tackle this issue, in this paper, a robust ELM (R-ELM) is proposed for improving the modeling capability and robustness of ELM in dealing with tasks with Gaussian and non-Gaussian noise. Different from the existed ELMs, which minimize the output weights and modeling errors with the assumption that noise follows Gaussian distribution, a new objective function of R-ELM is constructed, in which the characteristic of noise is described using mixture of Gaussian (MoG) for approximating the feature mapping between the inputs and the outputs. In addition, expectation maximization (EM) algorithm is implemented for estimating the parameters in the proposed R-ELM. The main contributions can be summarized as the following aspects:

(1) A robust objective function is developed based on MoG for enhancing the modeling capability with complex and unknown noise. Specifically, the squared loss of the modeling errors in the original objective function of ELM is replaced by MoG. Thus, the modified objective function enables R-ELM to be more robust due to the excellent capability of MoG for approximating any continuous noise distribution.

(2) Considering the analytical solutions of the parameters in the modified objective function of R-ELM cannot be calculated directly, EM algorithm is implemented to help obtain the optimal parameters.

(3) Comprehensive experiments have been conducted, the corresponding experimental results indicate that R-ELM outperforms state-of-the-art machine learning approaches on both selected benchmark datasets and real world applications.

The paper is organized as follows: Section 2 presents the ELM theory. The details of the proposed R-ELM are shown in Section 3, including the limitations of modeling with Gaussian noise, motivation of improving the modeling capability of ELM with unknown noise, modified objective function of R-ELM, and the corresponding solving process. Experimental results and further analysis on selected benchmark datasets are reported in Section 4, followed by the performance verification on two real world applications in Section 5. Finally, discussions, and conclusions and future works are respectively given in Sections 6 and 7.

Section snippets

ELM theory

In this section, a brief introduction of ELM theory is first given to facilitate the understanding of the following sections:

ELM was proposed for training the SLFNs with a three-layer structure, including: input layer, hidden layer, and output layer (see Fig. 1). Different from state-of-the-art machine learning approaches, its hidden layer parameters are generated randomly without iterative tuning, reducing the learning problem to that of estimating the optimal output weights β for a given

Robust modeling with unknown noise

In this section, the details of the proposed R-ELM will be given, including limitations of modeling with Gaussian noise, motivation of improving the modeling capability of ELM with unknown noise, objective function of R-ELM, and the corresponding solving process.

Performance verification on benchmark datasets

In this section, some selected benchmark datasets are employed to verify the effectiveness of the proposed R-ELM by comparing with a number of state-of-the-art machine learning approaches, including ELM [2], residual compensation ELM (RC-ELM) [16], PR-ELM [27], support vector machine (SVM), and back-propagation neural network (BPNN). In addition, all the experiments are conducted using Matlab 2015b running on a i5 3.2 GHz CPU with 4 GB RAM. In the experiments, the following root mean square

Performance verification on real world applications

In the above section, we have conducted experiments on selected benchmark datasets to demonstrate the performance of the proposed R-ELM. Then, we further evaluate the validity of R-ELM using two real world applications, including gas utilization ratio (GUR) prediction and hot metal silicon content (HMSC) prediction in blast furnace ironmaking process.

Blast furnace is one of the dominant unit for producing molten iron in the manufacture of iron and steel with large uncertainties. It usually has

Influence factors of modeling capability of ELM

In essence, three factors mainly confine the modeling capability of ELM, including:

(1) Sensitive objective function. The objective function of ELM is sensitive to the non-Gaussian noise, which widely exists in the real world applications. In order to tackle this issue, robust variants are proposed, such as R-ELM and PR-ELM, in which the original objective function of ELM is modified to approximate the complex and unknown noise distribution.

(2) Limited representation capability of the single

Conclusions

Most of the existed ELMs can theoretically obtain the optimal solutions under the assumption that the noise follows Gaussian distribution. However, in practice, noise of the real world applications is usually subject to unknown distributions, i.e., Gaussian, non-Gaussian, or even mixed distributions, which easily leads to the suboptimal solutions of these ELMs. In this paper, R-ELM is proposed to strengthen the modeling capability of classic ELM with unknown noise. Specifically, a modified

Declaration of Competing Interest

The Authors declare that we have no conflict of interest.

Acknowledgment

This work is supported in part by China Postdoctoral Science Foundation under Grants 2019TQ0002 and 2019M660328, National Natural Science Foundation of China under Grant 61673055 and National Key Research and Development Program of China under Grant 2017YFB1401203.

References (42)

G. Huang et al.
Trends in extreme learning machines: a review
Neural Netw.
(2015)
G.B. Huang et al.
Extreme learning machine: theory and applications
Neurocomputing
(2006)
I. Chaturvedi et al.
Bayesian network based extreme learning machine for subjectivity detection
J. Frankl. Inst.
(2018)
Y. Lan et al.
Constructive hidden nodes selection of extreme learning machine for regression
Neurocomputing
(2010)
Y. Miche et al.
TROP-ELM: a double-regularized ELM using LARS and Tikhonov regularization
Neurocomputing
(2011)
Y. Lan et al.
Ensemble of online sequential extreme learning machine
Neurocomputing
(2009)
B. Mirza et al.
Ensemble of subset online sequential extreme learning machine for class imbalance and concept drift
Neurocomputing
(2015)
W. Xiao et al.
Class-specific cost regulation extreme learning machine for imbalanced classification
Neurocomputing
(2017)
X. Luo et al.
Towards enhancing stacked extreme learning machine with sparse autoencoder by correntropy
J. Frankl. Inst.
(2018)
J. Zhang et al.
Multilayer probability extreme learning machine for device-free localization
Neurocomputing
(2020)

J. Zhang et al.

Residual compensation extreme learning machine for regression

Neurocomputing

(2018)

J. Cao et al.

Landmark recognition with sparse representation classification and extreme learning machine

J. Frankl. Inst.

(2015)

Z. Man et al.

A new robust training algorithm for a class of single-hidden layer feed forward neural networks

Neurocomputing

(2011)

Y.L. He et al.

A hierarchical structure of extreme learning machine (HELM) for high-dimensional datasets with noise

Neurocomputing

(2014)

Q. Hu et al.

Noise model based v-support vector regression with its application to short-term wind speed forecasting

Neural Netw.

(2014)

Y. Wu et al.

Incipient winding faults detection and diagnosis for squirrel-cage induction motors equipped on CRH trains

ISA Trans.

(2020)

G.B. Huang et al.

Universal approximation using incremental constructive feedforward networks with random hidden nodes

IEEE Trans. Neural Netw.

(2006)

N.Y. Liang et al.

A fast and accurate online sequential learning algorithm for feedforward networks

IEEE Trans. Neural Netw.

(2006)

Y. Li et al.

A novel online sequential extreme learning machine for gas utilization ratio prediction in blast furnaces

Sensors

(2017)

Y. Li et al.

Parallel one-class extreme learning machine for imbalance learning based on Bayesian approach

J. Amb. Intel. Hum. Comput.

(2018)

C.M. Wong et al.

Kernel-based multilayer extreme learning machines for representation learning

IEEE Trans. Neural Netw. Learn. Syst.

(2018)

Cited by (25)

Mixture extreme learning machine algorithm for robust regression
2023, Knowledge-Based Systems
The extreme learning machine (ELM) is a well-known approach for training single hidden layer feedforward neural networks (SLFNs) in machine learning. However, ELM is most effective when used for regression on datasets with simple Gaussian distributed error because it often employs a squared loss in its objective function. In contrast, real-world data is often collected from unpredictable and diverse contexts, which may contain complex noise that cannot be characterized by a single distribution. To address this challenge, we propose a robust mixture ELM algorithm, called Mixture-ELM, that enhances modeling capability and resilience to both Gaussian and non-Gaussian noise. The Mixture-ELM algorithm uses an adjusted objective function that blends Gaussian and Laplacian distributions to approximate any continuous distribution and match the noise. The Gaussian mixture accurately models the residual distribution, while the inclusion of the Laplacian distribution addresses the limitations of the Gaussian distribution in identifying outliers. We derive a solution to the novel objective function using the expectation maximization (EM) and iteratively reweighted least squares (IRLS) algorithms. We evaluate the effectiveness of the algorithm through numerical simulation and experiments on benchmark datasets, thereby demonstrating its superiority over other state-of-the-art machine learning methods in terms of robustness and generalization.
A hybrid Extreme Learning Machine model with Lévy flight Chaotic Whale Optimization Algorithm for Wind Speed Forecasting
2023, Results in Engineering
Efficient and accurate prediction of renewable energy sources (RES) is an interminable challenge in efforts to assure the stable and safe operation of any hybrid energy system due to its intermittent nature. High integration of RES especially wind energy into the existing power sector in recent years has made the situation still challenging which draws the attention of many researchers in developing a computationally efficient forecast model for accurately predicting RES. With the advent of Neural network based methods, ELM -Extreme Learning Machine, a typical Single Layer Feedforward Network (SLFFN), has gained a significant attention in recent years in solving various real-time complex problems due to simplified architecture, good generalization capabilities and fast computation. However, since the model parameters are randomly assigned, the conventional ELM is frequently ranked as the second-best model. As a solution, the article attempts to construct a unique optimized Extreme Learning Machine (ELM) based forecast model with improved accuracy for wind speed forecasting. A novel swarm intelligence technique- Lévy flight Chaotic Whale Optimization algorithm (LCWOA) is utilized in the hybrid model to optimize different parameters of ELM. Despite having a appropriate convergence rate, WOA is occasionally unable to discover the global optima due to imbalanced exploration and exploitation when using control parameters with linear variation. An improvement in the convergence rate of WOA can be expected by incorporating chaotic maps in the control parameters of WOA due to their ergodic nature. In addition to this, Lévy flight can significantly improve the intensification and diversification of the Whale Optimization algorithm (WOA) resulting in improvised search ability avoiding local minima. The prediction capability of the suggested hybrid Extreme Learning Machine (ELM) based forecast model is validated with nine other existing models. The experimental study affirms that the suggested model outperform existing forecasting methods in a variety of quantitative metrics.
Modified online sequential extreme learning machine algorithm using model predictive control approach
2023, Intelligent Systems with Applications
This paper stresses its contribution based on improving the learning dynamics of the online sequential extreme learning machine (OS-ELM) algorithm using a control system approach. We develop a predictive learning framework that enables optimization with a finite horizon using model predictive control (MPC). A Lyapunov inequality function for a discrete-time linear time-varying systems (DLTV) systems is utilized to guarantees learning dynamics stability. The numerical finding shows that the learning dynamics of our approach fit the sequential learning in the OS-ELM. To enhance the performance, we combine our model with principal component analysis (PCA) for dimensionality reduction and robust principal component analysis (RPCA) for handling data outliers. In this paper, two models were proposed: Alg. 1 is a modified OS-ELM with PCA, and Alg. 2 is a modified OS-ELM with RPCA. The experiment on regression and classification tasks has been conducted to show the efficacy of our proposed models. For regression tasks, our proposed model shows significant results in reducing the normalized mean square error (nRMSE). For the classification tasks, the accuracy performance has significantly increased. The increasing of the percentage performance improvement rate (PIR%) compared to the classic OS-ELM is reported as follows: Alg. 1 (4.83%) and Alg. 2 (3.03%) for binary classification; Alg. 1 (8.54%) and Alg. 2 (7.54%). The region of curve-area under curve (ROC-AUC) provides better discrimination results in differentiating between classes. From evaluation performance indicators, our proposed models show competitive results compared to other ELM types, such as kernel-based ELM (K-ELM) and multi-layer ELM (ML-OSELM and ML-ELM). We apply our proposed models for human gesture recognition to a case study of traffic gestures used by Indonesian police to regulate traffic flow. The experiment results show significant improvement in classifying human gestures, i.e., weighted-accuracy performance: Alg. 1 (93.8%); Alg. 2 (93.2%); and OS-ELM (81.8%).
Evolutionary feature selection on high dimensional data using a search space reduction approach
2023, Engineering Applications of Artificial Intelligence
Feature selection is becoming more and more a challenging task due to the increase of the dimensionality of the data. The complexity of the interactions among features and the size of the search space make it unfeasible to find the optimal subset of features. In order to reduce the search space, feature grouping has arisen as an approach that allows to cluster feature according to the shared information about the class. On the other hand, metaheuristic algorithms have proven to achieve sub-optimal solutions within a reasonable time. In this work we propose a Scatter Search (SS) strategy that uses feature grouping to generate an initial population comprised of diverse and high quality solutions. Solutions are then evolved by applying random mechanisms in combination with the feature group structure, with the objective of maintaining during the search a population of good and, at the same time, as diverse as possible solutions. Not only does the proposed strategy provide the best subset of features found but it also reduces the redundancy structure of the data. We test the strategy on high dimensional data from biomedical and text-mining domains. The results are compared with those obtained by other adaptations of SS and other popular strategies. Results show that the proposed strategy can find, on average, the smallest subsets of features without degrading the performance of the classifier.
Variational quantum extreme learning machine
2022, Neurocomputing
Citation Excerpt :
The work [26] proposed a modified online sequential learning algorithm that weights the new observations more and achieves better accuracy and more robustness. For unknown noise in the environment, a robust ELM is proposed to improve the robustness and generalization ability of the model under the disturbance of Gaussian and non-Gaussian noise [6]; (4) Extend ELM to deeper structures. For example, using ELM as a part of deep neural network to accelerate the training speed of the original network [27].
Extreme learning machine (ELM), with fast training speed and high generalization performance, has been widely used in many fields. However, it becomes inefficient or even impossible to process data with extremely large feature spaces, which is expected to be solved by quantum computing with an exponentially large quantum state space. Here, we propose a novel variational quantum extreme learning machine (VQELM). In detail, we design a special feature mapping method to achieve nonlinear transformation of the input data, replacing the hard-to-construct activation function on quantum devices. Considering that the Harrow-Hassidim-Lloyd algorithm is difficult to solve the ELM parameters on near-term quantum devices, we adopt a variational framework to facilitate implementation on the near-term noisy intermediate scale quantum computer. On both classification and regression tasks, our proposed method outperforms classical ELM in classical simulations. Moreover, the classification tasks achieved on IBM quantum simulator also show comparable classification accuracy. The final analysis shows that our proposed algorithm has an exponential improvement over classical ones for high-dimensional data processing, and is a powerful application of quantum machine learning on near-term quantum devices.
A GPU-based accelerated ELM and deep-ELM training algorithms for traditional and deep neural networks classifiers
2022, Intelligent Systems with Applications
Citation Excerpt :
Zhang et al. (2018) proposed the residual compensation ELM for regression problem by applying a multilayer structure with the baseline layer for the feature mapping between the input and the output, next to the other layers for residual compensation layer by layer in an iterative manner. Zhang et al. (2020) presented a robust ELM for improving the modeling capability and robustness with Gaussian and non-Gaussian noise. Xiao et al. (2017) proposed the class-specific cost regulation extreme learning machine together with its kernel based extension, for binary and multiclass classification problems with imbalanced data distributions.
The extreme learning machine (ELM) has been effectively used for training single-layer neural networks. In recent years, great attention has been paid to deep extreme learning machine (D-ELM) structures. Deep neural network structures are trained via the ELM method. Some stacked auto-encoders followed by a simple ELM layer can be usually used for solving classification and regression tasks. Although ELM has been employed for speeding up the training process of the neural network, D-ELM based models suffer from some issues such as the time complexity and running time.
In this paper, we explore how the evaluation of ELM and D-ELM can be accelerated. GPUs are used to speed up the training process of ELM and D-ELM models. In the proposed method, three separate phases are considered for the algorithms. In the first phase, loading and pre-processing the data are performed serially in the CPU. In the second and third phases, which respectively are the training and testing phases of the algorithm, all the matrix operations of the algorithms are implemented in parallel mode using the GPU memory hierarchy. Also, having access to highly efficient computational libraries, additional support is provided for GPU-based parallel computing.
In the simulation setup, five sets of the database are applied to train the ELM and D-ELM on both CPU and GPU platforms. The results obtained show the proposed approach based on GPUs can remarkable save running time. Although both serial and parallel methods measure approximately the same accuracy, the parallel methods provided for the models reduce the run time significantly.

View all citing articles on Scopus

View full text