Elsevier

Information Sciences

Volume 566, August 2021, Pages 178-194
Information Sciences

Adaptive discriminant analysis for semi-supervised feature selection

https://doi.org/10.1016/j.ins.2021.02.035Get rights and content

Abstract

As semi-supervised feature selection is becoming much more popular among researchers, many related methods have been proposed in recent years. However, many of these methods first compute a similarity matrix prior to feature selection, and the matrix is then fixed during the subsequent feature selection process. Clearly, the similarity matrix generated from the original dataset is susceptible to the noise features. In this paper, we propose a novel adaptive discriminant analysis for semi-supervised feature selection, namely, SADA. Instead of computing a similarity matrix first, SADA simultaneously learns an adaptive similarity matrix S and a projection matrix W with an iterative process. Moreover. we introduce the 2,p norm to control the sparsity of S by adjusting p. Experimental results show that S will become sparser with the decrease of p. The experimental results for synthetic datasets and nine benchmark datasets demonstrate the superiority of SADA, in comparison with 6 semi-supervised feature selection methods.

Introduction

Feature selection is very important for high-dimensional data analysis because it can remove irrelevant features with slight performance deterioration [1]. With the rapid increase of the data size, obtaining labeled data is often costly [2]. Therefore, to free us from the laborious and tedious data labeling work, only a small set of data samples are expected to be marked with ground truth. At the same time, it is desirable to exploit unlabeled samples during training to ensure the effectiveness of the learned models. The research topics related to this problem such as image annotations and categorizations have become hot spots in many machine learning fields [3], [4]. Thus, it is desirable to develop feature selection methods that can exploit both labeled and unlabeled data. Therefore, the study of “semi-supervised feature selection” has gained increasing attention [5], [6], [7].

Due to the advantages of semi-supervised feature selection, related methods have sprung up in recent years. However, these methods have a shortcoming of measuring features with a ranking criterion without considering the models [8], [9], [10], [11]. Ren et al. proposed a wrapper-type forward semi-supervised feature selection framework [12] that exploits labeled data and unlabeled data for the supervised sequential forward semi-supervised feature selection (SFFS). Xu et al. introduced a discriminative semi-supervised feature selection method based on the idea of manifold regularization, making use of classification margin and geometry of the probability distribution to select the features. However, because of its computational complexity of O(n2.5/), where n is the number of objects and is a fairly small stopping criterion, their method is time-consuming [13]. To choose the “best so far” feature subset from the streaming features, Wu et al. proposed a novel feature selection method called online streaming feature selection (OSFS) [14]. However, domain knowledge is required in OSFS, and thus, Eskandari et al. chose to use rough sets (RS) to optimize the former model (OS-NRRSAR-SA) [15]. Recently, Zhou et al. further improved the new OS-NRRSAR-SA model with the adapted neighborhood rough set [16]. Chen et al. have also used the rough set to perform feature selection on imbalanced data [17]. Embedded semi-supervised methods are superior to other feature selection methods in many ways because feature selection is set as part of the model training process. Chen proposed a semi-supervised feature selection method RLSR [5] in which a rescaled linear square regression is added to extend the least-squares regression for feature selection. Yuan et al. improved RLSR by introducing an -dragging technique to enlarge the distances between different classes [18]. In addition to the methods mentioned above, other methods such as semi-supervised feature selection via spline regression [19], ensemble feature selection [20], [21], [22], and parallel feature selection [23], [24] have also been developed. For most feature selection methods, a pair-wise similarity matrix is constructed from the original data and then set as fixed during the subsequent feature selection process. Researchers have designed several metrics to quantify the similarity between features based on some powerful tools like mutual information and graphs [25], [26]. However, as pointed in [27], such a similarity matrix may lose the inner class structure on the multimodal data in which samples in some classes form several separate clusters, and often mislead the feature selection methods into recovering the wrong local structure since it is easily affected by noise features.

Sparsity commonly exists in real-world data, thus, sparse learning has become a key component in feature selection. Shi et al. consider the superiority of l2,p-norm as well as its non-Lipschitz continuity and claim the effectiveness of l2,1-2-norm. Moreover, they apply CCCP and ADMM to solve the non-convex problem [28]. Zhang et al. draw the conclusion that for l2,p-norm, a smaller p leads to higher performance. They also discuss the situation where p0 and proposed two algorithms to optimize the discrete feature selection problem [29].

Recently, Chen et al. have proposed a LAP framework for both labeled and unlabeled data [30]. Instead of computing a fixed similarity matrix prior to performing feature selection, LAP learns an adaptive similarity matrix S and a projection matrix W simultaneously with an iterative method. Based on this method, we extend it to semi-supervised feature selection tasks by proposing a semi-supervised adaptive discriminant analysis (SADA). As an extension to LAP, SADA can learn better a similarity matrix by weakening the effect of noisy features on the similarity computing and deal with multimodal data [27] by investigating local structure in the data. The main contributions of our work include:

  • 1.

    We rewrite ||WT(xi-xj)||2 as ||WT(xi-xj)||2,pp by introducing the 2,p norm to control the sparsity of S by adjusting p, which can be used to adaptively preserve the locality. Experimental results show that S will become sparser with the decrease of p.

  • 2.

    We take both labeled and unlabeled data into account to better preserve the locality in the semi-supervised scenario.

  • 3.

    Comprehensive experiments on 9 benchmark datasets show the superior performance of the proposed approach in comparison with 6 semi-supervised feature selection methods.

The rest of this paper is organized as follows. Section 2 presents the notations, and Section 3 surveys the existing semi-supervised feature selection methods. The semi-supervised feature selection method, SADA, is proposed in Section 4. We present the experimental results and analysis in Section 5. The conclusions and directions for future work are provided in Section 6.

Section snippets

Notations and definitions

We now summarize the notations and the definition of the norms used in this paper. Matrices are written as boldface uppercase letters. Vectors are written as boldface lowercase letters. For matrix M=(mij), its i-th row is denoted as mi, and its j-th column is denoted by mj. The Frobenius norm of the matrix MRn×m is defined as MF=i=1nj=1mmij2. The 2,1-norm of matrix MRn×m is defined as M2,p=i=1nj=1mmij2p21p.

Semi-supervised feature selection

The early semi-supervised feature selection methods are filter-based which score the features with a ranking criterion regardless of the model [8], [9], [10], [11]. For example, Zhao et al. proposed a semi-supervised feature selection algorithm named sSelect based on spectral analysis [8]. Consider a dataset XRd×n consisting of two subsets: a set of l labeled objects XL=(x1,,xl) that are associated with class labels yLRl and a set of u=n-l unlabeled objects XU=(xl+1,,xl+u)T for which the

Proposed method

In this section, we propose the new feature selection method conducted in a semi-supervised manner.

Experiments on synthetic datasets

We generated a synthetic dataset D1 to test the projection ability of the proposed method for feature selection. D1 consists of 12 dimensions, where the data in the first two dimensions are distributed in three Gaussian shapes, while the data in the other dimensions are uniformly distributed noise features. Fig. 2a shows the dataset in the first two dimensions in which two small Gaussian clusters are buried in one class. In this experiment, our goal is to find a good projection direction that

Conclusions

In this paper, we have proposed a novel semi-supervised feature selection method, named SADA, that performs feature selection and implicit adaptive local structure learning simultaneously. The new method simultaneously learns a projection matrix W and an implicit adaptive similarity matrix S from both labeled and unlabeled data. In the new objective function, the 2,p norm is imposed on the pair-wise projected distances and experimental results show that learned implicit similarity matrix S

CRediT authorship contribution statement

Weichan Zhong: Methodology, Software, Validation, Formal analysis, Investigation, Data curation, Writing - original draft. Xiaojun Chen: Conceptualization, Methodology, Validation, Investigation, Writing - original draft. Feiping Nie: Conceptualization, Project administration, Writing - original draft. Joshua Zhexue Huang: Project administration, Writing - original draft.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This research was supported by Major Project of the New Generation of Artificial Intelligence (No. 2018AAA0102900), NSFC under Grant No. 61773268 Natural Science Foundation of SZU (Grant No. 000346) and the Shenzhen Research Foundation for Basic Research, China (Nos. JCYJ20180305124149387).

Weichan Zhong is a student for Master degree of the College of Computer Science and Software, Shenzhen University, Shenzhen, China. Her current research interests include clustering and feature selection.

References (32)

  • S.H. Huang

    Supervised feature selection: a tutorial

    Artif. Intell. Res.

    (2015)
  • Y. Luo et al.

    Vector-valued multi-view semi-supervised learning for multi-label image classification

  • C. Tang et al.

    Adaptive hypergraph embedded semi-supervised multi-label image annotation

    IEEE Trans. Multimedia

    (2019)
  • X. Chen, G. Yuan, F. Nie, J.Z. Huang, Semi-supervised feature selection via rescaled linear regression, in:...
  • J. Li, X. Liang, P. Li, W. Zhang, Q. Du, H. Yuan, Two-dimensional semi-supervised feature selection, in: 2020 10th...
  • Z. Zhao et al.

    Semi-supervised feature selection via spectral analysis, in

  • Cited by (27)

    • Graph adaptive semi-supervised discriminative subspace learning for EEG emotion recognition

      2023, Journal of King Saud University - Computer and Information Sciences
    • Semi-supervised feature selection via adaptive structure learning and constrained graph learning

      2022, Knowledge-Based Systems
      Citation Excerpt :

      AGLGSFS [51]: It is a semi-supervised model which achieves adaptive feature analysis by integrating group sparsity constraint and adaptive local manifold learning into a unified FS framework. SADA [52]: This method utilizes discriminant analysis techniques to extend the LAP [53] method. In this method, both the label information and projection distance are used to adaptively construct the similarity matrix.

    • Label correlations variation for robust multi-label feature selection

      2022, Information Sciences
      Citation Excerpt :

      Filter models and wrapper models are independent of or dependent on the subsequent learning algorithms respectively [9]. Embedded models fuse the feature selection process and learning algorithms into one learning framework so that feature selection can be achieved by optimizing the learning framework [10]. Embedded model is the research object in this paper because of high execution efficiency and excellent classification performance.

    View all citing articles on Scopus

    Weichan Zhong is a student for Master degree of the College of Computer Science and Software, Shenzhen University, Shenzhen, China. Her current research interests include clustering and feature selection.

    Xiaojun Chen (M’16) received a Ph.D. degree from the Harbin Institute of Technology, Harbin, China, in 2011. He is currently an Associate Professor of the College of Computer Science and Software, Shenzhen University, Shenzhen, China. His current research interests include subspace clustering, topic model, feature selection, and massive data mining.

    Feiping Nie received a Ph.D. degree in Computer Science from Tsinghua University, China, in 2009. His research interests are machine learning and its applications such as pattern recognition, data mining, computer vision, image processing and information retrieval. He has published more than 100 papers in the following top journals and conferences: TPAMI, IJCV, TIP, TNNLS/TNN, TKDE, TKDD, Bioinformatics, ICML, NIPS, KDD, IJCAI, AAAI et al. His papers have been cited more than 5000 times (Google scholar). He is now serving as Associate Editor or PC member for several prestigious journals and conferences in the related fields.

    Joshua Zhexue Huang received a Ph.D. degree from the Royal Institute of Technology, Stockholm, Sweden. He is currently a Professor with the College of Computer Science and Software, Shenzhen University, Shenzhen, China, a Professor and a Chief Scientist of the Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Beijing, China, and an Honorary Professor with the Department of Mathematics, The University of Hong Kong, Hong Kong. His current research interests include data mining, machine learning, and clustering algorithms.

    View full text