Similarity-based constraint score for feature selection

doi:10.1016/j.knosys.2020.106429

Knowledge-Based Systems

Volume 209, 17 December 2020, 106429

https://doi.org/10.1016/j.knosys.2020.106429 Get rights and content

Highlights

•
A new constraint score is proposed for feature selection.
•
It can be used in the context of both supervised and semi-supervised learnings.
•
It is based on similarity matrices to evaluate the relevance of a subset of features.
•
It evaluates a subset of features at once and can identify redundant features.
•
It outperforms the other state-of-the-art constraint scores.

Abstract

To avoid the curse of dimensionality resulting from a large number of features, the most relevant features should be selected. Several scores involving must-link and cannot-link constraints have been proposed to estimate the relevance of features. However, these constraint scores evaluate features one by one and ignore any correlation between them. In addition, they compute distance in the high-dimensional original feature space to evaluate similarity between samples. So, they would be corrupted by the curse of dimensionality. To deal with these drawbacks, we propose a new constraint score based on a similarity matrix that is computed in the selected feature subspace and that makes it possible to evaluate the relevance of a feature subset at once. Experiments on benchmark databases demonstrate the improvement brought by the proposed constraint score in the context of both supervised and semi-supervised learnings.

Introduction

In machine learning and pattern recognition applications, such as data mining and image analysis, datasets are often characterized by a large number of features. The processing of such high-dimensional data requires large memory storage and high computational time, and may lead to poor learning performance [1], [2]. To address these drawbacks, the dimensionality of data is often reduced by selecting relevant features. Typically, feature selection methods can be categorized into three types: filter, wrapper, and embedded methods [2], [3]. Filter methods evaluate features independently of the classification algorithm, while wrapper methods exploit a classification algorithm to evaluate the relevance of features. Embedded methods embed feature selection into the learning algorithm. As filter methods do not depend on any classification scheme, we focus on these methods [2], [3].

According to the availability of prototypes (i.e., labeled data samples that represent classes), feature selection methods can also be divided into unsupervised, supervised, and semi-supervised approaches [1], [2], [3]. Supervised feature selection only uses prototypes to measure the correlation of each feature with the class labels, while unsupervised feature selection analyzes unlabeled data samples to evaluate the feature capacity to preserve the intrinsic data structure [1]. Semi-supervised feature selection takes into account both prototypes and unlabeled data samples to evaluate the relevance of features.

In supervised and semi-supervised learning frameworks, besides class labels of prototypes, the available information can be also expressed by must-link and cannot-link constraints. A must-link constraint specifies that two data samples belong to the same class, while a cannot-link constraint specifies that two data samples belong to different classes [4]. Pairwise constraints can be provided by the user or easily generated from a small number of prototypes.

Must-link and cannot-link constraints are used to estimate the relevance of features via score functions, called constraint scores [1], [2]. These scores introduce constraints into similarity matrix that computes the similarity between data samples. Zhang et al. [5] proposed two supervised constraint scores that use only pairwise constraints to evaluate the relevance of features. Zhao et al. [6] defined a semi-supervised constraint score that analyzes both pairwise constraints and unlabeled data samples for feature selection. Kalakech et al. [1] combined an unsupervised score computed from unlabeled data samples with a supervised score that is computed from the pairwise constraints. This score is predicted to be less sensitive to constraint changes. Two semi-supervised constraint scores that assess the ability of a feature to preserve the local properties of unlabeled data samples while respecting pairwise constraints, have been proposed by Benabdeslem et al. in [7] for the former and in [8] for the latter. More recently, Yang et al. introduced a new semi-supervised constraint score which takes advantage of the local geometrical structure of unlabeled data samples as well as constraints deduced from prototypes [9], [10].

The above-mentioned constraint scores are part of the filter approach, they evaluate features one by one [2]. The score of a feature subset is estimated by the sum of the individual feature scores, and the evaluation of a feature subspace ignores correlation between features. Thus, learning algorithms that operate in a subspace of individually relevant features do not necessarily provide favorable results [8]. In addition, the constraint scores proposed in the literature are based on the analysis of a similarity matrix. Because the similarity matrix is computed in the original feature space, state-of-the-art feature scores can also be corrupted by the curse of dimensionality.

In this paper, we propose a new constraint score that evaluates the relevance of features in the context of both supervised and semi-supervised learnings. Our score assesses the ability of features to respect the available set of pairwise constraints. Unlike existing constraint scores that evaluate the relevance of each feature, our score evaluates a subset of several features simultaneously. The proposed score is then used as a criterion by a sequential forward selection scheme to identify the most relevant subset of features [11].

The performance of the constraint scores is measured by the classification accuracy of the test data commonly obtained by the nearest neighbor classifier. Previous studies use the entire training dataset with true class labels as prototypes by the classifier, while only few prototypes are analyzed by the constraint scores. By following conditions that are similar to those in real-life applications, in this paper we propose using only available information. In the supervised context, only the prototypes involved in pairwise constraint generation are analyzed by the classifier. In the semi-supervised context, we follow the same strategy proposed by Kalakech et al. [12] that first performs the constrained K-means algorithm [4] to classify the unlabeled training data samples and then uses the classified samples as prototypes in classifying the test data. Instead of performing the constrained K-means algorithm, we use constrained spectral clustering, which is based on the same similarity matrix concept used by the constraint scores.

The remainder of this paper is organized as follows. Section 2 provides brief definitions on spectral graph theory and pairwise constraint generation. In Section 3 a primary state-of-the-art about constraint scores is presented. Our proposed constraint score and the feature selection procedure are presented in Section 4. Experimental results achieved with benchmark databases related to supervised and semi-supervised feature selection are provided and discussed in Section 5.

Section snippets

Preliminaries

Constraint scores are based on the concepts of spectral graph theory and pairwise constraints. In this section, we briefly give some notations and definitions related to these two concepts.

Constraint scores

The performance achieved by learning algorithms such as classification or clustering depends on similarity, which is based on the Euclidean distance in the original $d$ -dimensional feature space. Because these features are not always relevant, many authors select the best ones thanks to constraint scores that combine the concepts of spectral graph theory and pairwise constraints.

Proposed constrained feature selection

Existing constraint scores estimate the relevance of each feature considered independently and separately from each other. Because these scores do not take into account the correlation between features, we propose a new constraint score that estimates the relevance of a subset of features at once.

Experiments on benchmark databases

We evaluate and compare our proposed constraint score with several constraint scores and several well-known feature selection methods on datasets originating from benchmark databases. We first examine the supervised feature scores ( $ε^{S}$ , $C^{1}$ and $C^{2}$ ). Then, we assess the performance attained by semi-supervised feature scores ( $ε^{S S}$ , $C^{3}$ , $C^{4}$ , $C^{5}$ , $C^{6}$ and $C^{7}$ ). Because the data are scaled between $0$ and $1$ , the scaling parameter $σ$ used to compute the similarity matrices are set to $1$ . Feature selection

Conclusion and future work

In this paper, we presented a new constraint score for feature selection in the context of both supervised and semi-supervised learnings. This score evaluates a subset of features at once, whereas state-of-the-art constraint scores evaluate only one feature at a time. This makes it possible to identify redundant features and to avoid the problem of correlation between features. Because our score evaluates similarity between data samples in the examined feature subspace, selected features can be

CRediT authorship contribution statement

Abderezak Salmi: Conceptualization, Methodology, Software, Validation, Writing - original draft, Writing - review & editing. Kamal Hammouche: Conceptualization, Methodology, Validation, Writing - original draft, Writing - review & editing, Supervision. Ludovic Macaire: Conceptualization, Methodology, Validation, Writing - original draft, Writing - review & editing, Supervision.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References (23)

KalakechM. et al.
Constraint scores for semi-supervised feature selection: A comparative study
Pattern Recognit. Lett.
(2011)
SheikhpourR. et al.
A survey on semi-supervised feature selection methods
Pattern Recognit.
(2017)
CaiJ. et al.
Feature selection in machine learning: A new perspective
Neurocomputing
(2018)
ZhangD. et al.
Constraint score: A new filter method for feature selection with pairwise constraints
Pattern Recognit.
(2008)
ZhaoJ. et al.
Locality sensitive semi-supervised feature selection
Neurocomputing
(2008)
K. Wagstaff, C. Cardie, S. Rogers, S. Schrödl, et al. Constrained k-means clustering with background knowledge, in:...
BenabdeslemK. et al.
Constrained laplacian score for semi-supervised feature selection
BenabdeslemK. et al.
Efficient semi-supervised feature selection: Constraint, relevance, and redundancy
IEEE Trans. Knowl. Data Eng.
(2014)
YangX.-K. et al.
Semi-supervised feature selection for audio classification based on constraint compensated Laplacian score
EURASIP J. Audio Speech Music Process.
(2016)
YangX.-K. et al.
Semi-supervised minimum redundancy maximum relevance feature selection for audio classification
Multimedia Tools Appl.
(2018)

SiedleckiW. et al.

On automatic feature selection

Int. J. Pattern Recognit. Artif. Intell.

(1988)

Cited by (4)

Class-specific feature selection via maximal dynamic correlation change and minimal redundancy
2023, Expert Systems with Applications
Information theory has been widely used to evaluate the relevance and redundancy of features in feature selection. The traditional feature selection methods based on information theory focus on classification-specific feature selection. Classification-specific feature selection considers all classes together and therefore may not be able to select a desirable feature subset for a particular class. In contrast, class-specific feature selection can choose an appropriate feature subset for each class. The main goal of this paper is to propose a class-specific feature selection method based on information theory. In this regard, we first develop a class-specific feature evaluation criterion called Class-Specific Maximal Dynamic Correlation Change and Minimal Redundancy (CSMDCCMR), which consists of three items. The first item characterizes the correlation between a candidate feature and a particular class. The second item quantifies the dynamic changes in the relevance of a candidate feature and selected features with respect to a particular class. The third item expresses the redundancy between a candidate feature and selected features with respect to a particular class. We then design a class-specific feature selection algorithm by combining the CSMDCCMR criterion with a sequential forward search strategy, which can be employed to select a suitable feature subset for each class. Finally, we carry out extensive experiments to compare our class-specific feature selection method with nine representative classification-specific feature selection methods based on information theory and six state-of-the-art class-specific feature selection methods. The experimental results show that our class-specific feature selection method achieves better classification performance than these comparison methods.
Iterative constraint score based on hypothesis margin for semi-supervised feature selection
2023, Knowledge-Based Systems
To remove redundant features and avoid the curse of dimensionality, the most important features should be selected for downstream tasks, including semi-supervised learning. Several semi-supervised constraint scores using pairwise constraints have been proposed to estimate the relevance of features. However, these methods evaluate the features individually and ignore the correlations between them. Thus, we propose a semi-supervised feature selection method called the iterative constraint score based on the hypothesis margin (HM-ICS), which uses forward sequential selection to select an optimal feature subset with a good ability to maintain the constraint structure of the data and distinguish samples that belong to different classes. HM-ICS iteratively modifies the classical constraint score method to measure the relevance between features and maintain the constraint structure of the data. By introducing the hypothesis margin, HM-ICS can ensure strong discriminative power of the optimal feature subset. Extensive experiments were conducted on nine UCI and five high-dimensional datasets, and the experimental results confirmed that HM-ICS can achieve better performance than state-of-the-art supervised and semi-supervised methods.
A novel feature selection method via mining Markov blanket
2023, Applied Intelligence
3-3FS: ensemble method for semi-supervised multi-label feature selection
2021, Knowledge and Information Systems

View full text

Similarity-based constraint score for feature selection

Highlights

Abstract

Introduction

Section snippets

Preliminaries

Constraint scores

Proposed constrained feature selection

Experiments on benchmark databases

Conclusion and future work

CRediT authorship contribution statement

Declaration of Competing Interest

Pattern Recognit. Lett.

Pattern Recognit.

Neurocomputing

Pattern Recognit.

Neurocomputing

Constrained laplacian score for semi-supervised feature selection

Efficient semi-supervised feature selection: Constraint, relevance, and redundancy

IEEE Trans. Knowl. Data Eng.

Semi-supervised feature selection for audio classification based on constraint compensated Laplacian score

EURASIP J. Audio Speech Music Process.

Semi-supervised minimum redundancy maximum relevance feature selection for audio classification

Multimedia Tools Appl.

On automatic feature selection

Int. J. Pattern Recognit. Artif. Intell.