Multi-label feature selection via manifold regularization and dependence maximization

doi:10.1016/j.patcog.2021.108149

Pattern Recognition

Volume 120, December 2021, 108149

https://doi.org/10.1016/j.patcog.2021.108149 Get rights and content

Highlights

•
Presenting a new multi-label feature selection method that efficiently combines manifold regularization and dependence maximization.
•
Introducing a HSIC-based measurement to evaluate the dependence between the manifold space and label space.
•
Developing an iterative optimization method to solve the objective function of our method MRDM with good convergence.
•
Conducting extensive experiments on various multi-label data sets to demonstrate the superiority of the proposed method.

Abstract

Feature selection is able to select more discriminative features for classification and plays an important role in multi-label learning to alleviate the effect of the curse of dimensionality. Recently, the multi-label feature selection methods based on the sparse regression model have received increasing attentions. However, most of these methods directly project original data space to label space in the regression model, which is inappropriate because the linear assumption between data space and label space doesn't hold in most cases. In the paper, we propose a feature selection method named multi-label feature selection via manifold regularization and dependence maximization (MRDM). In the regression model of MRDM, the original data space is projected to a low-dimensional manifold space, which not only has the same topological structure with the original data, but also has a strong dependence with the class labels. Then, an objective function involving $l_{2, 1}$ -norm regularization is formulated, and an alternating optimization-based iterative algorithm is designed to obtain the sparse coefficients for multi-label feature selection. Extensive experiments on various multi-label data sets demonstrate the superiority of the proposed method compared with some state-of-the-art multi-label feature selection methods.

Introduction

In the traditional supervised single-label learning, each instance only belongs to one class label. However, in many real applications, it is more appropriate for an instance to be associated with multiple class labels since the instance can contain multiple semantic meanings at the same time [1,2]. In those cases, multi-label leaning is needed to assign multiple labels to each instance. The idea of multi-label learning is first used in the field of text categorization, where a document is related to several topics simultaneously [3,4]. Since then, it has attracted much interest and has been applied into a wide range of applications, such as automatic image annotation [5], music emotions classification [6] and bioinformatics [7].

Similar to the single-label learning, multi-label learning also has to face the problem of curse of dimensionality since multi-label data sets usually contain instances with a large number of features [8,9]. Many of these features are irrelevant and redundant. They not only increase the computational costs of time and space, but also lead to poor classification performance. Therefore, it is necessary to remove these features through dimensionality reduction. Feature extraction and feature selection are two main solutions to dimensionality reduction. The former maps the original feature space to a lower-dimensional subspace [10], [11], [12] while the latter directly selects a small feature subset from the whole feature set [13], [14], [15], [16]. The difference between the two solutions is that feature extraction creates new features that are uninterpretable while feature selection preserves the physical meanings of original features. Hence, feature selection has attracted more and more attentions, and has shown effectiveness of improving the performance of multi-label learning algorithms.

Recently, the multi-label feature selection methods based on the sparse regression model have been proven efficient [17], [18], [19], [20], [21]. In the methods, a least square regression model with a sparse regularization term and some other constraint conditions is usually considered. The importance of each feature can be scored based on the regression coefficients. However, most of these methods assume that there is a linear relationship between the data space and label space, and thus the class labels are linearly regressed on the original data. Unfortunately, the assumption is usually not valid.

According to spectral regression (SR) [22] which combines spectral graph and sparse regression for subspace learning, each sample is regressed to its own manifold structure. This idea has been adopted in some manifold-based feature selection methods [14, 19, 23, 24]. In the paper, our proposed method, named multi-label feature selection via manifold regularization and dependence maximization (MRDM), is also based on the SR model, but two constraints are added. One is dependence constraint between the low-dimensional embedding and the associated class labels, and the other is a structure constraint between the embedding and the original data. In MRDM, Hilbert-Schmidt Independence Criterion (HSIC) is applied as the measurement of the dependence due to its simplicity and neat theoretical properties [25], and Graph Laplacian is calculated to characterize the nonlinear geometric structure of data. The most representative features are selected through sparsity regularization and the above two constraints of dependence and structure. The main contributions of our work are summarized as follows:

•
Presenting a new multi-label feature selection method that efficiently combines manifold regularization and dependence maximization.
•
Introducing a HSIC-based measurement to evaluate the dependence between the manifold space and label space.
•
Developing an iterative optimization method to solve the objective function of our method MRDM with good convergence.
•
Conducting extensive experiments on various multi-label data sets to demonstrate the superiority of the proposed method.

The rest of this paper is organized as follows. A brief review of related works is given in Section 2, and the details of the proposed method is presented in Section 3. In Section 4, the experimental results are analyzed. Finally, Section 5 concludes and gives some issues for the future work.

Section snippets

Related work

Feature Selection is considered to be an effective tool to solve the problem of “curse of dimensionality” and to improve the classification performance of multi-label learning. Like single-label feature selection methods, multi-label feature selection methods can also be grouped into three categories [8]: filter methods, wrapper methods and embedded methods. Among them, wrapper methods apply certain search strategies to obtain feature subsets and evaluate them by using the classifier that will

Proposed approach

In this section, we propose a new multi-label feature selection method via manifold regularization and dependence maximization, namely MRDM. First, we summarize the symbols used in our paper briefly. Then a detailed description of the proposed algorithm is presented. Finally, we introduce an effective solution to the optimization problem of this algorithm.

Experiments

In this section, we compare our method MRDM with eight state-of-the-art multi-label feature selection methods and Baseline method (without feature selection) on eight commonly used multi-label data sets.

Results and discussion

The experimental results of MRDM and other comparing algorithms on the eight data sets in terms of the five evaluation metrics are given in Fig. 1, Fig. 2, Fig. 3, Fig. 4, Fig. 5, Fig. 6, Fig. 7, Fig. 8-. The horizonal axis indicates the different numbers of selected features and the vertical axis indicates the classification performance on different evaluation criteria. From the figures, we can see that the performances of all the feature selection methods are generally improved with the

Conclusion

In this paper, we propose a novel multi-label feature selection method via manifold regularization and dependence maximization (MRDM). In some sparse feature selection methods, the data space is directly mapped into the label space through a coefficient matrix, which is inappropriate due to the lack of linear relationship between data space and label space in most cases. Inspired by the spectral regression, we propose to replace the label space with a low-dimensional manifold embedding. This

Declaration of Competing Interest

None.

Rui Huang received her Ph.D. degree in the School of Electronic and Information from Northwestern Polytechnical University in 2006. Currently, she is an associate professor at the School of Communication and Information Engineering, Shanghai University, China. Her research areas are artificial intelligence and machine learning.

References (45)

M. Jiang et al.
Multi-label text categorization using l21-norm minimization extreme learning machine
Neurocomputing
(2017)
X. Jia et al.
Image multi-label annotation based on supervised nonnegative matrix factorization with new matching measurement
Neurocomputing
(2017)
J. Li et al.
Multi-label maximum entropy model for social emotion classification over short text
Neurocomputing
(2016)
N. Spolaôr et al.
A systematic review of multi-label feature selection and a new method based on label construction
Neurocomputing
(2016)
G. Doquire et al.
Mutual information-based feature selection for multi-label classification
Neurocomputing
(2013)
L. Hu et al.
Multi-label feature selection with shared common mode
Pattern Recognit.
(2020)
J. Zhang et al.
Manifold regularized discriminative feature selection for multi-label learning
Pattern Recognit.
(2019)
M. Zhang et al.
Feature selection for multi-label naive Bayes classification
Inf. Sci.
(2009)
J. Lee et al.
Feature selection for multi-label classification using multivariate mutual information
Pattern Recognit. Lett.
(2013)
Y. Lin et al.
Multi-label feature selection based on max-dependency and min-redundancy
Neurocomputing
(2015)

Z. Sun et al.

Mutual information based multi-label feature selection via constrained convex optimization

Neurocomputing

(2019)

R. Huang et al.

Manifold-based constraint Laplacian score for multi-label feature selection

Pattern Recognit. Lett.

(2018)

M. Zhang et al.

ML-KNN: a lazy learning approach to multi-label learning

Pattern Recognit.

(2007)

G. Tsoumakas et al.

Mining multi-label data

Data Mining and Knowledge Discovery Handbook

(2009)

M. Zhang et al.

A review on multi-label learning algorithms

IEEE Trans. Knowl. Data Eng.

(2014)

R.E. Schapire et al.

Boostexter: a boosting-based system for text categorization

Mach. Learn.

(2000)

X. Cheng et al.

iATC-mISF: a multi-label classifier for predicting the classes of anatomical therapeutic chemicals

Bioinformatics

(2016)

Y. Li et al.

Recent advances in feature selection and its applications

Knowl. Inf. Syst.

(2017)

K Yu et al.

Multi-label informed latent semantic indexing

Y. Zhang et al.

Multilabel dimensionality reduction via dependence maximization

ACM Trans. Knowl. Discov. Data

(2010)

H. Wang et al.

Multi-label linear discriminant analysis

X. He et al.

Laplacian score for feature selection

Cited by (46)

Multi-label feature selection with global and local label correlation
2024, Expert Systems with Applications
In various application domains, high-dimensional multi-label data has become more prevalent, presenting two significant challenges: instances with high-dimensional features and a large number of labels. In the context of multi-label feature selection, the objective is to choose a subset of features from a given set that is highly pertinent for predicting multiple labels or categories associated with each instance. However, certain characteristics of multi-label classification, such as label dependencies and imbalanced label distribution, have often been overlooked although they hold valuable insights for designing effective multi-label feature selection algorithms. In this paper, we propose a feature selection model which exploits explicit global and local label correlations to select discriminative features across multiple labels. In addition, by representing the feature matrix and label matrix in a shared latent space, the model aims to capture the underlying correlations between features and labels. The shared representation can reveal common patterns or relationships that exist across multiple labels and features. An objective function involving $L_{2, 1}$ -norm regularization is formulated, and an alternating optimization-based iterative algorithm is designed to obtain the sparse coefficients for multi-label feature selection. The proposed method was evaluated on 14 real-world multi-label datasets using six evaluation metrics, through comprehensive experiments. The results indicate its effectiveness, surpassing that of several representative methods.
Multi-label feature selection via latent representation learning and dynamic graph constraints
2024, Pattern Recognition
As an effective method to deal with the curse of dimensionality, multi-label feature selection aims to select the most representative subset of features by eliminating unfavorable features. Although great progress has been made in this field, how to mine adequate supervisory information from multi-label data remains a key challenge. Compared to the latent information of instances, the latent information of instance relevance contains both the basic information of instances and the latent relevance between instances. Base on this knowledge, we propose a novel multi-label feature selection method named LRDG that explores latent representation learning and dynamic graph constraints. Specifically, we introduce the latent representation of instance relevance as supervisory information for pseudo-label learning, and minimize information loss during pseudo-label learning by means of the label manifold, the non-negative constraints, and the minimization of the Frobenius norm between pseudo-labels and ground-truth labels. In addition, considering the shortcomings brought by traditional graph regularization, we propose to use the dynamic graph constructed from low-dimensional pseudo-labels to constrain feature weights. Extensive experiments on various multi-label datasets demonstrate the effectiveness of the proposed method. The code is available at https://github.com/yunbao520/LRDG.
Multi-label feature selection via adaptive dual-graph optimization
2024, Expert Systems with Applications
As in single-label learning, multi-label learning (MLL) also suffers from the problem of “the curse of dimensionality” due to the redundancy of features in the original data. To address this problem, extensive research has been conducted to explore the various feature selection algorithms aiming to eliminate irrelevant and redundant features. Among these, the manifold learning based methods have gained significant attention for their ability to capture local label correlations. However, previous approaches have mainly focused on extracting information from data space, overlooking valuable information in feature space. In addition, these methods conduct manifold learning and feature selection independently, thereby degrading the learning performance. To tackle this problem, we propose a novel multi-label feature selection method based on adaptive dual-graph optimization (MFS-ADGO) to learn the geometric structure of the data and select discriminative feature subsets simultaneously. Specifically, a dual-graph manifold regularization is constructed first, to locally capture label correlations in the data and feature spaces for discriminative features selection. A constraint is then imposed on the similarity matrices to adaptively learn the local geometric structure of the data during the feature selection, thereby capturing more reliable information from the original data. To facilitate feature selection, an objective function incorporating the $l_{2, 1 - 2}$ -norm is proposed, along with an iterative updating algorithm. Extensive experiments unequivocally demonstrated the efficacy of MFS-ADGO in enhancing state-of-the-art feature selection methods in the realm of MLL.
Multi-target HSIC-Lasso via exploiting target correlations
2024, Expert Systems with Applications
Multi-target regression (MTR) has been widely applied due to its compact structure and better computational efficiency in learning multiple related tasks, and has received extensive attention in machine learning, while great challenge arises from handling complex nonlinear relationships in high-dimensional data. In this study, we propose a novel approach to nonlinear dimensionality reduction in MTR. Specifically, the Hilbert–Schmidt Independence Criterion Lasso (HSIC-Lasso) is introduced to captures nonlinear dependencies between input–output and inter-target relationships in a unified framework. Additionally, sparse learning is incorporated to effectively eliminate redundant and irrelevant features. The proposed method is guaranteed to find an optimal subset of maximally predictive features with minimal redundancy, and explores target correlations by treating the targets as input features, which achieves a better learning of the target dependencies and thus improves the effectiveness of the feature selection. Extensive experiments on twelve public data sets have shown that the proposed methods outperform the compared methods in feature selection for the MTR problem.
Sparse low-redundancy multilabel feature selection based on dynamic local structure preservation and triple graphs exploration
2024, Expert Systems with Applications
Much semantic information is involved in multilabel data due to more than one label associated with each instance. The redundant features and noise challenge knowledge mining in multilabel data. Constructing a learning model with discriminative features is essential for multilabel learning. Sparse graph-based methods simultaneously consider the topological structure, complex relations between features and labels, and the significance of features. However, three challenges exist. First, they either consider local label correlation or local label relevance and are complementary in the feature selection process. Second, existing methods use low-quality static graphs to explore local label correlations that result in degraded performance. Finally, only some methods deal with redundant features. A ridge regression-based sparse multilabel learning is proposed in this study to address these problems. The global and local label correlation are explored by preserving the instance-level graph structure to obtain a robust low-dimensional pseudo-label matrix to construct a high-quality dynamic label-level graph. Meanwhile, it preserves the feature-level graph structure to select low-redundant features. In addition, a new $ℓ_{2, 1 / 2 - 2}$ -norm is designed to maintain the high-row sparsity of the model. The above items are embedded into a unified multilabel learning framework. A simple and effective optimization solution is finally designed and compared with eight relevant algorithms on twelve public benchmark data sets. The results demonstrate that the algorithm can improve classification performance.
Discriminative multi-label feature selection with adaptive graph diffusion
2024, Pattern Recognition
Feature selection can alleviate the problem of the curse of dimensionality by selecting more discriminative features, which plays an important role in multi-label learning. Recently, embedded feature selection methods have received increasing attentions. However, most existing methods learn the low-dimensional embeddings under the guidance of the local structure between the original instance pairs, thereby ignoring the high-order structure between instances and being sensitive to noise in the original features. To address these issues, we propose a feature selection method named discriminative multi-label feature selection with adaptive graph diffusion (MFS-AGD). Specifically, we first construct a graph embedding learning framework equipped with adaptive graph diffusion to uncover a latent subspace that preserves the higher-order structure information between four tuples. Then, the Hilbert–Schmidt independence criterion (HSIC) is incorporated into the embedding learning framework to ensure the maximum dependency between the latent representation and labels. Benefiting from the interactive optimization of the feature selection matrix, latent representation and similarity graph, the selected features can accurately explore the higher-order structural and supervised information of data. By further considering the correlation between labels, MFS-AG is extended to a more discriminative version,i.e., LMFS-AG. Extensive experimental results on various benchmark data sets validate the advantages of the proposed MFS-AGD and LMFS-AGD methods.

View all citing articles on Scopus

Zhejun Wu received his B.E degree from Shanghai University in 2018. Now, he is working toward MS degree in the School of Communication and Information Engineering, Shanghai University, China. His research focuses on multi-label feature selection.

View full text

Multi-label feature selection via manifold regularization and dependence maximization

Highlights

Abstract

Introduction

Section snippets

Related work

Proposed approach

Experiments

Results and discussion

Conclusion

Declaration of Competing Interest

Neurocomputing

Neurocomputing

Neurocomputing

Neurocomputing

Neurocomputing

Pattern Recognit.

Pattern Recognit.

Inf. Sci.

Pattern Recognit. Lett.

Neurocomputing

Neurocomputing

Pattern Recognit. Lett.

Pattern Recognit.

Mining multi-label data

Data Mining and Knowledge Discovery Handbook

A review on multi-label learning algorithms

IEEE Trans. Knowl. Data Eng.

Boostexter: a boosting-based system for text categorization

Mach. Learn.

iATC-mISF: a multi-label classifier for predicting the classes of anatomical therapeutic chemicals

Bioinformatics

Recent advances in feature selection and its applications

Knowl. Inf. Syst.

Multi-label informed latent semantic indexing

Multilabel dimensionality reduction via dependence maximization

ACM Trans. Knowl. Discov. Data

Multi-label linear discriminant analysis

Laplacian score for feature selection