Robust extreme learning machine for modeling with unknown noise

https://doi.org/10.1016/j.jfranklin.2020.06.027Get rights and content

Abstract

Extreme learning machine (ELM) is an emerging machine learning technique for training single hidden layer feedforward networks (SLFNs). During the training phase, ELM model can be created by simultaneously minimizing the modeling errors and norm of the output weights. Usually, squared loss is widely utilized in the objective function of ELMs, which is theoretically optimal for the Gaussian error distribution. However, in practice, data collected from uncertain and heterogeneous environments trivially result in unknown noise, which may be very complex and cannot be described well using any single distribution. In order to tackle this issue, in this paper, a robust ELM (R-ELM) is proposed for improving the modeling capability and robustness with Gaussian and non-Gaussian noise. In R-ELM, a modified objective function is constructed to fit the noise using mixture of Gaussian (MoG) to approximate any continuous distribution. In addition, the corresponding solution for the new objective function is developed based on expectation maximization (EM) algorithm. Comprehensive experiments, both on selected benchmark datasets and real world applications, demonstrate that the proposed R-ELM has better robustness and generalization performance than state-of-the-art machine learning approaches.

Introduction

In the past decade, extreme learning machine (ELM), as a type of generalized single hidden layer feedforward networks (SLFNs), has been intensively studied both in theory and applications [1]. Unlike the traditional gradient based training approaches for SLFNs, which are easy to trap in the local minimum and time-consuming, the hidden layer parameters of ELM are assigned randomly without iterative tuning, and then it only needs to solve the least-square problem [2], [3]. Accordingly, ELM has much faster learning speed and is easier to implement than state-of-the-art machine learning approaches. Theoretically, Huang et al. [4] have proven the universal approximation of ELM. In addition, ELM has been extended to online learning [5], [6], structure optimization [7], [8], ensemble learning [9], [10], imbalance learning [11], [12], representation learning [13], [14], [15], as well as residual learning [16], [17], etc. For real world applications, ELM has been implemented to landmark recognition [18], [19], industrial production [20], [21], and wireless localization [22], [23], etc.

As mentioned above, ELM is becoming an increasingly significant research topic in the machine learning field, but majority of the existed ELMs assume that the data utilized for modeling are pure without noise and outliers, or with Gaussian error distribution. However, data uncertainty is inevitable in practical scenarios due to sampling errors, measurement errors, and modeling errors, etc., which may lead to noise subject to unknown distributions. It means that noise of the real world applications should be more complex, which may follow Gaussian distribution, Laplace distribution, or mixed distributions. In addition, the performance of the data-driven predictor will degrade seriously if the data are chaotic or too noisy. Therefore, ELMs without considering the effects of uncertainties may be not sufficient. There are usually two ways for strengthening the modeling capability of ELM in uncertain scenarios, including outlier detection and removing, and modifying objective function. For example, FIR-ELM was proposed to reduce the input disturbance by removing some undesired signal components through the FIR filtering [24]. He et al. [25] designed a hierarchical ELM to deal with high-dimensional noisy data, in which some groups of subnets were proposed for simultaneously reducing data dimension and filtering noise. However, the aforementioned outlier detection based ELMs may identify pure data as outliers, which easily break the original data structure and cause information loss. Another set of solutions is to enhance the robustness of the data-driven predictor by modifying the objective function of ELM. Specifically, second order cone programming, widely utilized in robust convex optimization problems, was introduced into ELM, but the computational burden of the novel ELM was relatively heavy [26]. Lu et al. [27] rewrote the objective function of ELM and proposed a probabilistic regularized ELM (PR-ELM) by incorporating the distribution information of modeling error into the modeling process, in which both the modeling error mean and the modeling variance were included in the modified objective function. The experimental results indicated that the proposed PR-ELM had a well-fitting performance and was more robust to noise. Although, ELMs with modified objective functions can achieve satisfactory performance in several tasks, the squared loss is still utilized in most of them, which may not guarantee that those ELMs can achieve the optimal solutions if the noise follows non-Gaussian distribution.

In order to tackle this issue, in this paper, a robust ELM (R-ELM) is proposed for improving the modeling capability and robustness of ELM in dealing with tasks with Gaussian and non-Gaussian noise. Different from the existed ELMs, which minimize the output weights and modeling errors with the assumption that noise follows Gaussian distribution, a new objective function of R-ELM is constructed, in which the characteristic of noise is described using mixture of Gaussian (MoG) for approximating the feature mapping between the inputs and the outputs. In addition, expectation maximization (EM) algorithm is implemented for estimating the parameters in the proposed R-ELM. The main contributions can be summarized as the following aspects:

(1) A robust objective function is developed based on MoG for enhancing the modeling capability with complex and unknown noise. Specifically, the squared loss of the modeling errors in the original objective function of ELM is replaced by MoG. Thus, the modified objective function enables R-ELM to be more robust due to the excellent capability of MoG for approximating any continuous noise distribution.

(2) Considering the analytical solutions of the parameters in the modified objective function of R-ELM cannot be calculated directly, EM algorithm is implemented to help obtain the optimal parameters.

(3) Comprehensive experiments have been conducted, the corresponding experimental results indicate that R-ELM outperforms state-of-the-art machine learning approaches on both selected benchmark datasets and real world applications.

The paper is organized as follows: Section 2 presents the ELM theory. The details of the proposed R-ELM are shown in Section 3, including the limitations of modeling with Gaussian noise, motivation of improving the modeling capability of ELM with unknown noise, modified objective function of R-ELM, and the corresponding solving process. Experimental results and further analysis on selected benchmark datasets are reported in Section 4, followed by the performance verification on two real world applications in Section 5. Finally, discussions, and conclusions and future works are respectively given in Sections 6 and 7.

Section snippets

ELM theory

In this section, a brief introduction of ELM theory is first given to facilitate the understanding of the following sections:

ELM was proposed for training the SLFNs with a three-layer structure, including: input layer, hidden layer, and output layer (see Fig. 1). Different from state-of-the-art machine learning approaches, its hidden layer parameters are generated randomly without iterative tuning, reducing the learning problem to that of estimating the optimal output weights β for a given

Robust modeling with unknown noise

In this section, the details of the proposed R-ELM will be given, including limitations of modeling with Gaussian noise, motivation of improving the modeling capability of ELM with unknown noise, objective function of R-ELM, and the corresponding solving process.

Performance verification on benchmark datasets

In this section, some selected benchmark datasets are employed to verify the effectiveness of the proposed R-ELM by comparing with a number of state-of-the-art machine learning approaches, including ELM [2], residual compensation ELM (RC-ELM) [16], PR-ELM [27], support vector machine (SVM), and back-propagation neural network (BPNN). In addition, all the experiments are conducted using Matlab 2015b running on a i5 3.2 GHz CPU with 4 GB RAM. In the experiments, the following root mean square

Performance verification on real world applications

In the above section, we have conducted experiments on selected benchmark datasets to demonstrate the performance of the proposed R-ELM. Then, we further evaluate the validity of R-ELM using two real world applications, including gas utilization ratio (GUR) prediction and hot metal silicon content (HMSC) prediction in blast furnace ironmaking process.

Blast furnace is one of the dominant unit for producing molten iron in the manufacture of iron and steel with large uncertainties. It usually has

Influence factors of modeling capability of ELM

In essence, three factors mainly confine the modeling capability of ELM, including:

(1) Sensitive objective function. The objective function of ELM is sensitive to the non-Gaussian noise, which widely exists in the real world applications. In order to tackle this issue, robust variants are proposed, such as R-ELM and PR-ELM, in which the original objective function of ELM is modified to approximate the complex and unknown noise distribution.

(2) Limited representation capability of the single

Conclusions

Most of the existed ELMs can theoretically obtain the optimal solutions under the assumption that the noise follows Gaussian distribution. However, in practice, noise of the real world applications is usually subject to unknown distributions, i.e., Gaussian, non-Gaussian, or even mixed distributions, which easily leads to the suboptimal solutions of these ELMs. In this paper, R-ELM is proposed to strengthen the modeling capability of classic ELM with unknown noise. Specifically, a modified

Declaration of Competing Interest

The Authors declare that we have no conflict of interest.

Acknowledgment

This work is supported in part by China Postdoctoral Science Foundation under Grants 2019TQ0002 and 2019M660328, National Natural Science Foundation of China under Grant 61673055 and National Key Research and Development Program of China under Grant 2017YFB1401203.

References (42)

  • J. Zhang et al.

    Residual compensation extreme learning machine for regression

    Neurocomputing

    (2018)
  • J. Cao et al.

    Landmark recognition with sparse representation classification and extreme learning machine

    J. Frankl. Inst.

    (2015)
  • Z. Man et al.

    A new robust training algorithm for a class of single-hidden layer feed forward neural networks

    Neurocomputing

    (2011)
  • Y.L. He et al.

    A hierarchical structure of extreme learning machine (HELM) for high-dimensional datasets with noise

    Neurocomputing

    (2014)
  • Q. Hu et al.

    Noise model based v-support vector regression with its application to short-term wind speed forecasting

    Neural Netw.

    (2014)
  • Y. Wu et al.

    Incipient winding faults detection and diagnosis for squirrel-cage induction motors equipped on CRH trains

    ISA Trans.

    (2020)
  • G.B. Huang et al.

    Universal approximation using incremental constructive feedforward networks with random hidden nodes

    IEEE Trans. Neural Netw.

    (2006)
  • N.Y. Liang et al.

    A fast and accurate online sequential learning algorithm for feedforward networks

    IEEE Trans. Neural Netw.

    (2006)
  • Y. Li et al.

    A novel online sequential extreme learning machine for gas utilization ratio prediction in blast furnaces

    Sensors

    (2017)
  • Y. Li et al.

    Parallel one-class extreme learning machine for imbalance learning based on Bayesian approach

    J. Amb. Intel. Hum. Comput.

    (2018)
  • C.M. Wong et al.

    Kernel-based multilayer extreme learning machines for representation learning

    IEEE Trans. Neural Netw. Learn. Syst.

    (2018)
  • Cited by (25)

    • Variational quantum extreme learning machine

      2022, Neurocomputing
      Citation Excerpt :

      The work [26] proposed a modified online sequential learning algorithm that weights the new observations more and achieves better accuracy and more robustness. For unknown noise in the environment, a robust ELM is proposed to improve the robustness and generalization ability of the model under the disturbance of Gaussian and non-Gaussian noise [6]; (4) Extend ELM to deeper structures. For example, using ELM as a part of deep neural network to accelerate the training speed of the original network [27].

    • A GPU-based accelerated ELM and deep-ELM training algorithms for traditional and deep neural networks classifiers

      2022, Intelligent Systems with Applications
      Citation Excerpt :

      Zhang et al. (2018) proposed the residual compensation ELM for regression problem by applying a multilayer structure with the baseline layer for the feature mapping between the input and the output, next to the other layers for residual compensation layer by layer in an iterative manner. Zhang et al. (2020) presented a robust ELM for improving the modeling capability and robustness with Gaussian and non-Gaussian noise. Xiao et al. (2017) proposed the class-specific cost regulation extreme learning machine together with its kernel based extension, for binary and multiclass classification problems with imbalanced data distributions.

    View all citing articles on Scopus
    View full text