Regularizing extreme learning machine by dual locally linear embedding manifold learning for training multi-label neural network classifiers

https://doi.org/10.1016/j.engappai.2020.104062Get rights and content

Abstract

Multi-label learning has been received much attention due to its applicability in machine learning problems. In current years, quite few approaches based on either extreme learning machine (ELM) or radial basis function (RBF) neural network have been proposed with the aim of increasing the efficiency of the multi-label classification. Most existing multi-label learning algorithms focus on information about the feature space. In this paper, our major intention is to regularize the objective function of multi-label learning methods via Locally Linear Embedding (LLE). To achieve this goal, two neural network architectures namely Multi-Label RBF (ML-RBF) and Multi-Label Multi Layer ELM (ML-ELM) are utilized. Then, a regularized multi-label learning method via feature manifold learning (RMLFM) and a regularized multi-label learning method via dual-manifold learning (RMLDM) are established for training two network structures. RMLDM simultaneously exploits the geometry structure of both feature and data space. Furthermore, eight different configurations of applying training algorithms (i.e., RMLFM and RMLDM) to model architectures (i.e., ML-RBF and ML-ELM) are considered for conducting comparisons. The validity and effectiveness of these eight classifiers are indicated by a number of experimental studies on several multi-label datasets. Furthermore, the experiments indicate that the efficiency of the classification can be improved considerably against some cutting-the-edge multi-label techniques for the neural classifiers in which the dual-manifold learning is used as the training method.

Introduction

As a particularly significant technique in data analysis, the classification is vastly used in practical applications. In the traditional classification, each sample corresponds to a unique label. This task is called the Single-Label Classification Task (SLCT). There are some usual classifiers in the traditional machine learning such as naive Bayes, neural network, SVM, and decision tree that perform SLCT. Nevertheless, with the advent of technology, SLCT algorithms cannot deal with a great amount of labeled data. Compared to the SLCT, in the Multi-Label Classification Task (MLCT), each sample associated with more than one label. For example, a natural scene can correspond to mountain, tree and sunset simultaneously, or a movie may correspond to several genres. Moreover, the MLCT has been extensively applied to many areas such as text classification and image annotation (Boutell et al., 2004, Chen et al., 2013).

The aim of the MLCT is to learn a model that provides a subset of labels for a given instance which has unknown labels. MLCT methods can be broadly broken down into two families (Sorower, 2010): Problem Transformation (PT) and Algorithm Adaptation (AAda). The PT methods, such as binary relevance (BR) (Tsoumakas and Katakis, 2007), random k-label sets (Tsoumakas et al., 2011) and classifier chain (Read et al., 2009), map the MLCT into one or more SLCT. In fact, these methods ignore the correlation between labels. On the other hand, the AAda methods modify traditional classification learning algorithms in order to handle the MLCT directly. As an example, ML-KNN, a multi-label lazy learning based on the traditional k-nearest neighbor algorithm, was proposed by Zhang and Zhou (2007). ML-KNN employs the maximum posterior probability of the k-nearest neighbor samples to the label prediction. Zhang and Zhou (2006) utilized the back-propagation algorithm for the MLCT and formulated a strategy which attaches importance to the labels associated with a sample by giving them a higher position in ranking compared to the ones that are irrelevant to that sample.

Over the past decades, several neural network methods have been suggested for the MLCT. Two major categories of such methods are the radial basis function-based and extreme learning machine-based neural networks. Radial basis function (RBF) neural networks, introduced as one of the most ubiquitous kinds of feed-forward neural networks, is employed in a wide range of cases, such as classification of datasets, regression analysis, and time series estimation (Schwenker et al., 2001). RBF neural networks are also differentiated from other traditional neural networks on account of the universal approximation property (Hornik et al., 1989), which declares that a continuous function on a real compact set can be estimated by a feed-forward network with one hidden layer under particular circumstances. In fact, the connections between the inputs and outputs can be established well, and an RBF network enjoys a high capacity of global approximation.

Another existing neural network methods for the MLCT is based on the extreme learning machine (ELM) which is an efficient algorithm for single-hidden layer feed-forward neural networks (SLFNs) and achieves a good level of performance for some important machine learning applications, like the regression and classification (Huang et al., 2004, Huang et al., 2006). It is demonstrated that ELM is capable of providing a platform for SLFNs so that it benefits immensely from (1) the training process with the great speed, and (2) the high potential of favorable generalization (Huang et al., 2015). Here, the generalization mentions the case that a machine learning model has the power to adapt itself to proceed with the new data. The basic philosophy of ELM is to determine the parameters of hidden nodes in a random way in that the parameters do not depend on the training data, and the weight coefficients are modified according to associate constraints (Huang et al., 2015). Moreover, ELM aims to reduce both the error of the training phase and the norm of the weight coefficients to the smallest possible amount (Huang et al., 2012). For this reason, this minimization strategy, which is adopted in ELM, results in a great level of performance of ELM in comparison with the traditional back-propagation algorithm (Bartlett, 1998). To put it another way, ELM employs various feature maps, for example the Sigmoid function and Gaussian kernels, within its framework in order to establish a unified pattern for the learning of regression and classification problems (Huang et al., 2012). In fact, ELM can be a highly successful method to deal with classification problems with high-dimensional datasets (Kasun et al., 2016).

As a result of these considerable achievements of RBF neural networks and the ELM algorithm, researchers have recently encouraged to conduct a series of experiments on the application of RBF and ELM for the MLCT. For instance, Zhang (2009) introduced a neural network based on the MLCT, which is called ML-RBF and obtained by extending the standard RBF neural network. ML-RBF performs the k-means clustering algorithm in order to determine the centers of the RBFs and the number of hidden layer nodes in the network. Kasun et al. (2013) have proposed an extreme learning machine based autoencoder (ELM-AE) method in which the input data is equal to output, and singular values are applied for the feature learning. In order to increase the performance of ML-RBF, Zhang et al. (2016b) have proposed a multi-layer RBF-based model for the MLCT (ML-ELM-RBF), which is a deep network and acquired from ML-RBF and weight uncertainty ELM-AE (WuELM-AE). Xu et al. (2019) have proposed an algorithm for the MLCT by applying the affinity propagation clustering algorithm to ML-RBF and the Laplacian extreme learning machine (Lap-ELM) method. The fundamental notion of Lap-ELM introduced by Zhang et al. (2016a) is to enhance the efficiency of ELM by contemplating the local manifold structure information of the data space.

Manifold learning is a popular application of geometry in the field of machine learning (Ma and Fu, 2011). The fundamental assumption of a manifold learning technique is that the data is located on a low dimensional space, which is embedded in a higher dimensional space (Izenman, 2012). A good many manifold learning methods have been proposed up to now, e.g., Principal Component Analysis (PCA) (Jolliffe, 1986), Isomap (Tenenbaum et al., 2000) and Locally Linear Embedding (LLE) (Roweis and Saul, 2000). Traditional manifold learning techniques focused on the data manifold structure. For example, Cai et al. (2010) have presented a graph regularized non-negative matrix factorization (GNMF) method that employs the geometry information of the data space. Recently, Cai and Zhu (2018) have proposed a multi-label feature selection methods via feature manifold learning and sparsity regularization. This method uses the feature manifold structure to enhance the performance of the feature selection method. However, the duality between data and feature manifold structures have been recently considered in some manifold learning algorithms. Gu and Zhou (2009) have proposed a dual regularized co-clustering (DRCC) method based on manifold learning, which considers the duality between data points and features. A Dual-GNMF (DGNMF) method, which is perceived the geometry information of both the feature manifold and the data manifold simultaneously, has been introduced by Shang et al. (2012). Considering the literature of multi-label, it is evident that no other work has studied the geometry information of both feature and data space simultaneously. In fact, analyzing the manifold structure of the feature and data space, we can discover the substantial geometry information to increase the performance of a learning algorithm.

Motivated by these recent applications of manifold learning to machine learning problems, we target the MLCT that integrates the data manifold structure, the feature manifold structure and the sparsity regularization concept. The first framework proposes a regularized MLCT via feature manifold learning (RMLFM). In order to capture the geometry information of the feature manifold, a feature manifold graph based on LLE is constructed. The second framework presents a regularized MLCT via dual-manifold learning (RMLDM) so as to capture simultaneously the geometry information of both data and feature manifold structures, a data manifold graph and a feature manifold graph based on LLE are constructed. The major highlights of this paper are:

  • 1.

    Two novel frameworks for the MLCT, named regularized MLCT via feature manifold learning (RMLFM) and regularized MLCT via dual-manifold learning (RMLDM), are proposed. RMLFM applies the feature manifold regularization term to preserve the local structure of features, and RMLDM simultaneously considers both feature manifold and data manifold regularization in order to preserve the local structure of data and features.

  • 2.

    Two iterative algorithms by using the global conjugate gradient method are designed to solve the objective functions of the proposed methods RMLFM and RMLDM.

  • 3.

    Inspired by the advantages of the RBF-based neural networks (i.e., the universal approximation property) and ELM-based neural networks (i.e., the training process with the great speed, and the high potential of favorable generalization), two frameworks based on ML-RBF and six frameworks based on ML-ELM are introduced, in which the models suggested for RMLDF and RMLDM are used. Indeed, the idea of feature manifold learning or dual-manifold learning, and the l2,1-norm regularization are considered in these frameworks.

  • 4.

    To the best of our knowledge, our proposed approach is the only existed method in which LLE is used to asses both the data and feature geometrical information in the MLCT.

The remainder of this paper is structured as follows. Section 2 briefly reviews some of the key theoretical notions and related works. The details of the proposed RMLFM and RMLDM methods are represented in Section 3 and Section 4, respectively. Under the RMLFM and RMLDM frameworks, two variants for ML-RBF and six variants for ML-ELM are presented in Sections 5 A novel RBF-based framework for the MLCT, 6 A novel framework based on multi layer ELM for the MLCT, respectively. In Section 7, some experimental studies are performed to demonstrate the validity and efficiency of the proposed methods. In Section 8, the application of our proposed multi-label methods to the image classification problem is examined. Eventually, Section 9 draws the conclusions.

Section snippets

Related works

In the following, some notations used in this paper are introduced. Furthermore, some theoretical backgrounds of ELM, ELM-AE, ML-RBF and LLE are briefly overviewed.

Regularized multi-label learning method via feature manifold learning

In this section, a novel regularized MLCT via feature manifold learning (RMLFM) is introduced in detail. The framework of RMLFM comprises two main stages: (1) regularized MLCT; (2) local feature structure preserving.

Regularized multi-label learning method via dual-manifoldlearning

As explained in Section 3, RMLFM builds on the basis of incorporation the idea of feature manifold regularization into the least square regression model, in which a feature graph is constructed by using the LLE method to the feature space. Moreover, to ensure the sparsity of regression coefficient matrix, the l2,1-norm is used. However, the RMLFM method does not exploit information about the data manifold. To overcome this problem, we propose to extend the RMLFM method by using the concept of

A novel RBF-based framework for the MLCT

In this section, two variants for the MLCT based on RBF, called regularized ML-RBF via feature manifold learning (RML-RBF-FM) and regularized ML-RBF via dual-manifold learning (RML-RBF-DM), are proposed. The framework for RML-RBF-FM and RML-RBF-DM is given in Algorithm 4. Both of these methods can be regarded as two extensions of ML-RBF. In fact,

  • 1.

    in RML-RBF-FM, the l2,1-norm regularizer and the feature graph regularizer are incorporated to the objective function of ML-RBF given by (6), and the

A novel framework based on multi layer ELM for the MLCT

The framework of the multi-layer ELM (shown in Fig. 2) is composed of two major learning parts: the unsupervised part and the supervised part. In the unsupervised part, a useful feature representation is learned by stacking the ELM-AE. In the supervised part, a classification task is performed by using ELM.

In this section, taking into account of feature manifold learning and dual-manifold learning and according to the idea of ELM-AE, we propose to consider three different cases for calculating

Experimental results

In this section, an in-depth study is conducted on fourteen datasets to assess the effectiveness of the presented methods. Moreover, we compare their performance with four state-of-the-art methods ML-ELM-RBF, ML-RBF, ML-KNN and RELM.

Application to the image classification

In recent years, the use of multi-label learning approaches has been particularly prevalent in a number of real-world subjects, such as text classification, image classification and bioinformatics (Boutell et al., 2004, Chen et al., 2013, Lu and Weng, 2007, Druzhkov and Kustikova, 2016). For instance, the image classification (IC) problem, as one of the most fundamental issues in machine learning, is based on the design of a proper procedure for assigning a set of labels to images by

Conclusions

In this paper, two novel MLCT methods called regularized MLCT via feature manifold learning (RMLFM) and regularized MLCT via dual-manifold learning (RMLDM) have been established. RMLFM uses the idea of feature manifold regularization into the least square regression model. To put it another way, different to the previous MLCT methods in which the local feature structure is preserved, RMLDM considers each of the feature and data manifold regularizations in order to preserve the local data and

CRediT authorship contribution statement

Mohammad Rezaei-Ravari: Conceptualization, Formal analysis, Methodology, Writing - original draft, Software. Mahdi Eftekhari: Conceptualization, Formal analysis, Methodology, Writing - review & editing, Supervision, Project administration. Farid Saberi-Movahed: Conceptualization, Methodology, Formal analysis, Writing - review & editing.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References (52)

  • ShangF. et al.

    Graph dual regularization non-negative matrix factorization for co-clustering

    Pattern Recognit.

    (2012)
  • XuX. et al.

    Multi-label learning method based on ML-RBF and laplacian ELM

    Neurocomputing

    (2019)
  • ZhangN. et al.

    Denoising laplacian multi-layer extreme learning machine

    Neurocomputing

    (2016)
  • ZhangN. et al.

    Multi layer ELM-RBF for multi-label learning

    Appl. Soft Comput.

    (2016)
  • ZhangM.-L. et al.

    ML-KNN: A lazy learning approach to multi-label learning

    Pattern Recognit.

    (2007)
  • BartlettP.L.

    The sample complexity of pattern classification with neural networks: the size of the weights is more important than the size of the network

    IEEE Trans. Inform. Theory

    (1998)
  • CaiD. et al.

    Graph regularized nonnegative matrix factorization for data representation

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2010)
  • CaiZ. et al.

    Multi-label feature selection via feature manifold learning and sparsity regularization

    Int. J. Mach. Learn. Cybern.

    (2018)
  • ChangC.-C. et al.

    LIBSVM: A library for support vector machines

    ACM Trans. Intell. Syst. Technol. (TIST)

    (2011)
  • Chua, T.-S., Tang, J., Hong, R., Li, H., Luo, Z., Zheng, Y., 2009. NUS-WIDE: A real-world web image database from...
  • DemšarJ.

    Statistical comparisons of classifiers over multiple data sets

    J. Mach. Learn. Res.

    (2006)
  • DengW.-Y. et al.

    Research on Extreme Learning of neural networks

    Chinese J. Comput.

    (2010)
  • DruzhkovP. et al.

    A survey of deep learning methods and software tools for image classification and object detection

    Pattern Recognit. Image Anal.

    (2016)
  • ElisseeffA. et al.

    A kernel method for multi-labelled classification

  • GuQ. et al.

    Co-clustering on manifolds

  • HeyouniM. et al.

    Matrix krylov subspace methods for linear systems with multiple right-hand sides

    Numer. Algorithms

    (2005)
  • Cited by (38)

    • Label correlations-based multi-label feature selection with label enhancement

      2024, Engineering Applications of Artificial Intelligence
    View all citing articles on Scopus
    View full text