Elsevier

Knowledge-Based Systems

Volume 209, 17 December 2020, 106429
Knowledge-Based Systems

Similarity-based constraint score for feature selection

https://doi.org/10.1016/j.knosys.2020.106429Get rights and content

Highlights

  • A new constraint score is proposed for feature selection.

  • It can be used in the context of both supervised and semi-supervised learnings.

  • It is based on similarity matrices to evaluate the relevance of a subset of features.

  • It evaluates a subset of features at once and can identify redundant features.

  • It outperforms the other state-of-the-art constraint scores.

Abstract

To avoid the curse of dimensionality resulting from a large number of features, the most relevant features should be selected. Several scores involving must-link and cannot-link constraints have been proposed to estimate the relevance of features. However, these constraint scores evaluate features one by one and ignore any correlation between them. In addition, they compute distance in the high-dimensional original feature space to evaluate similarity between samples. So, they would be corrupted by the curse of dimensionality. To deal with these drawbacks, we propose a new constraint score based on a similarity matrix that is computed in the selected feature subspace and that makes it possible to evaluate the relevance of a feature subset at once. Experiments on benchmark databases demonstrate the improvement brought by the proposed constraint score in the context of both supervised and semi-supervised learnings.

Introduction

In machine learning and pattern recognition applications, such as data mining and image analysis, datasets are often characterized by a large number of features. The processing of such high-dimensional data requires large memory storage and high computational time, and may lead to poor learning performance [1], [2]. To address these drawbacks, the dimensionality of data is often reduced by selecting relevant features. Typically, feature selection methods can be categorized into three types: filter, wrapper, and embedded methods [2], [3]. Filter methods evaluate features independently of the classification algorithm, while wrapper methods exploit a classification algorithm to evaluate the relevance of features. Embedded methods embed feature selection into the learning algorithm. As filter methods do not depend on any classification scheme, we focus on these methods [2], [3].

According to the availability of prototypes (i.e., labeled data samples that represent classes), feature selection methods can also be divided into unsupervised, supervised, and semi-supervised approaches [1], [2], [3]. Supervised feature selection only uses prototypes to measure the correlation of each feature with the class labels, while unsupervised feature selection analyzes unlabeled data samples to evaluate the feature capacity to preserve the intrinsic data structure [1]. Semi-supervised feature selection takes into account both prototypes and unlabeled data samples to evaluate the relevance of features.

In supervised and semi-supervised learning frameworks, besides class labels of prototypes, the available information can be also expressed by must-link and cannot-link constraints. A must-link constraint specifies that two data samples belong to the same class, while a cannot-link constraint specifies that two data samples belong to different classes [4]. Pairwise constraints can be provided by the user or easily generated from a small number of prototypes.

Must-link and cannot-link constraints are used to estimate the relevance of features via score functions, called constraint scores [1], [2]. These scores introduce constraints into similarity matrix that computes the similarity between data samples. Zhang et al. [5] proposed two supervised constraint scores that use only pairwise constraints to evaluate the relevance of features. Zhao et al. [6] defined a semi-supervised constraint score that analyzes both pairwise constraints and unlabeled data samples for feature selection. Kalakech et al. [1] combined an unsupervised score computed from unlabeled data samples with a supervised score that is computed from the pairwise constraints. This score is predicted to be less sensitive to constraint changes. Two semi-supervised constraint scores that assess the ability of a feature to preserve the local properties of unlabeled data samples while respecting pairwise constraints, have been proposed by Benabdeslem et al. in [7] for the former and in [8] for the latter. More recently, Yang et al. introduced a new semi-supervised constraint score which takes advantage of the local geometrical structure of unlabeled data samples as well as constraints deduced from prototypes [9], [10].

The above-mentioned constraint scores are part of the filter approach, they evaluate features one by one [2]. The score of a feature subset is estimated by the sum of the individual feature scores, and the evaluation of a feature subspace ignores correlation between features. Thus, learning algorithms that operate in a subspace of individually relevant features do not necessarily provide favorable results [8]. In addition, the constraint scores proposed in the literature are based on the analysis of a similarity matrix. Because the similarity matrix is computed in the original feature space, state-of-the-art feature scores can also be corrupted by the curse of dimensionality.

In this paper, we propose a new constraint score that evaluates the relevance of features in the context of both supervised and semi-supervised learnings. Our score assesses the ability of features to respect the available set of pairwise constraints. Unlike existing constraint scores that evaluate the relevance of each feature, our score evaluates a subset of several features simultaneously. The proposed score is then used as a criterion by a sequential forward selection scheme to identify the most relevant subset of features [11].

The performance of the constraint scores is measured by the classification accuracy of the test data commonly obtained by the nearest neighbor classifier. Previous studies use the entire training dataset with true class labels as prototypes by the classifier, while only few prototypes are analyzed by the constraint scores. By following conditions that are similar to those in real-life applications, in this paper we propose using only available information. In the supervised context, only the prototypes involved in pairwise constraint generation are analyzed by the classifier. In the semi-supervised context, we follow the same strategy proposed by Kalakech et al. [12] that first performs the constrained K-means algorithm [4] to classify the unlabeled training data samples and then uses the classified samples as prototypes in classifying the test data. Instead of performing the constrained K-means algorithm, we use constrained spectral clustering, which is based on the same similarity matrix concept used by the constraint scores.

The remainder of this paper is organized as follows. Section 2 provides brief definitions on spectral graph theory and pairwise constraint generation. In Section 3 a primary state-of-the-art about constraint scores is presented. Our proposed constraint score and the feature selection procedure are presented in Section 4. Experimental results achieved with benchmark databases related to supervised and semi-supervised feature selection are provided and discussed in Section 5.

Section snippets

Preliminaries

Constraint scores are based on the concepts of spectral graph theory and pairwise constraints. In this section, we briefly give some notations and definitions related to these two concepts.

Constraint scores

The performance achieved by learning algorithms such as classification or clustering depends on similarity, which is based on the Euclidean distance in the original d-dimensional feature space. Because these features are not always relevant, many authors select the best ones thanks to constraint scores that combine the concepts of spectral graph theory and pairwise constraints.

Proposed constrained feature selection

Existing constraint scores estimate the relevance of each feature considered independently and separately from each other. Because these scores do not take into account the correlation between features, we propose a new constraint score that estimates the relevance of a subset of features at once.

Experiments on benchmark databases

We evaluate and compare our proposed constraint score with several constraint scores and several well-known feature selection methods on datasets originating from benchmark databases. We first examine the supervised feature scores (εS, C1 and C2). Then, we assess the performance attained by semi-supervised feature scores (εSS, C3, C4, C5, C6 and C7). Because the data are scaled between 0 and 1, the scaling parameter σ used to compute the similarity matrices are set to 1. Feature selection

Conclusion and future work

In this paper, we presented a new constraint score for feature selection in the context of both supervised and semi-supervised learnings. This score evaluates a subset of features at once, whereas state-of-the-art constraint scores evaluate only one feature at a time. This makes it possible to identify redundant features and to avoid the problem of correlation between features. Because our score evaluates similarity between data samples in the examined feature subspace, selected features can be

CRediT authorship contribution statement

Abderezak Salmi: Conceptualization, Methodology, Software, Validation, Writing - original draft, Writing - review & editing. Kamal Hammouche: Conceptualization, Methodology, Validation, Writing - original draft, Writing - review & editing, Supervision. Ludovic Macaire: Conceptualization, Methodology, Validation, Writing - original draft, Writing - review & editing, Supervision.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References (23)

  • SiedleckiW. et al.

    On automatic feature selection

    Int. J. Pattern Recognit. Artif. Intell.

    (1988)
  • Cited by (4)

    View full text