Rough set-based feature selection for weakly labeled data
Introduction
Weakly supervised learning [69] refers to machine learning tasks in which training instances are not required to be associated with a precise target label. Instead, the annotations can be imprecise or partial. Such tasks could be the consequence of certain data pre-processing operations such as anonymization [15], [49] or censoring [17], could be due to imprecise measurements or expert opinions, or meant to limit data annotation costs [45]. Some examples of weakly supervised learning tasks include semi-supervised learning, but also more general tasks like learning from soft labels [8], [12], [13], [48], (in which partial labels are represented through belief functions) which, in turn, encompasses both learning from fuzzy labels [14], [28] (in which partial labels are represented through possibility distributions) and superset learning [29], [40], [44]. In this latter setting, which will be the focus of this article, each instance x is annotated with a set S of candidate labels that are deemed (equally) possible. In other words, we know that the label of x is an element of S, but nothing more. For example, an image could be tagged with , suggesting that the animal shown on the picture is one of these three, though it is not exactly known which of them.
In the recent years, the superset learning task has been widely investigated both under the classification perspective [19], [30], [64], [66] and from a theoretical standpoint [39]. The latter result is particularly relevant, as it shows that, as in the standard PAC learning model, superset learnability is characterized by combinatorial dimensions (e.g., Vapnik-Chervonenkis or Natarajan dimension) which, in general, depend on the dimensionality (i.e., the number of features) of the learning problem. Thus, the availability of effective feature selection [24] or dimensionality reduction algorithms would be of critical importance in order to control model capacity and, hence, ensure proper model generalization. Nevertheless, this task has not received much attention so far [61].
In this article, which is an extension of our previous article [6], we study the application of rough set theory in the setting of superset learning. In particular, adhering to the generalized risk minimization principle [28], we consider the problem of feature reduction as a mean for data disambiguation, i.e., for the purpose of figuring out the most plausible precise instantiation of the imprecise training data. Compared to our previous work, we provide a finer characterization of the theoretical properties and relations among the proposed definitions of reduct through Theorem 3.4, Theorem 3.5, Theorem 3.7 that were previously left as open problems. In Section 4, which has been newly added, we also discuss two computational experiments by which we study the empirical performance of the proposed reduct definitions, also in comparison with the state-of-the-art method for dimensionality reduction in superset learning.
Section snippets
Background
In this section, we recall basic notions of rough set theory (RST) and belief function theory, which will be used in the main part of the article.
Superset decision tables and reducts
In this section, we extend some key concepts of rough set theory to the setting of superset learning.
Experiments
In this section, we present a series of experimental studies meant to evaluate the different definitions of reduct in superset learning as put forward in this paper, as well as the performance of the proposed algorithms in light of the state-of-the-art in superset dimensionality reduction (DELIN algorithm, see Section 2). More specifically, our experiments are aimed at studying the following aspects:
- •
Reduct approximation: The ability of the different types of reducts to recover the true reducts
Conclusion
Addressing the problem of superset learning in the context of rough set theory, as we did in this paper, appears to be interesting and mutually beneficial for both sides:
- •
RST provides natural tools for data disambiguation, which is at the core of methods for superset learning, most notably the notion of a reduct. Here, the basic idea is that the plausibility of an instantiation of the data is in direct correspondence with the (information-theoretic) complexity it implies for the dependency
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
References (69)
- et al.
Orthopartitions and soft clustering: soft mutual information measures for clustering validation
Knowl.-Based Syst.
(2019) - et al.
Belief rule mining using the evidential reasoning rule for medical diagnosis
Int. J. Approx. Reason.
(2021) - et al.
Learning from partially supervised data using mixture models and belief functions
Pattern Recognit.
(2009) - et al.
Handling possibilistic labels in pattern classification using evidential reasoning
Fuzzy Sets Syst.
(2001) - et al.
Properties of measures of information in evidence and possibility theories
Fuzzy Sets Syst.
(1987) Learning from imprecise and fuzzy observations: data disambiguation through generalized loss minimization
Int. J. Approx. Reason.
(2014)Learning from imprecise and fuzzy observations: data disambiguation through generalized loss minimization
Int. J. Approx. Reason.
(2014)- et al.
A new definition of entropy of belief functions in the Dempster–Shafer theory
Int. J. Approx. Reason.
(2018) - et al.
On properties of a new decomposable entropy of Dempster-Shafer belief functions
Int. J. Approx. Reason.
(2020) - et al.
Online active learning of decision trees with evidential data
Pattern Recognition
(2016)
Reasoning with belief functions: an analysis of compatibility
Int. J. Approx. Reason.
A survey on semi-supervised feature selection methods
Pattern Recognit.
Information content of an evidence
Int. J. Man-Mach. Stud.
The transferable belief model
Artif. Intell.
Dimensionality reduction based on rough set theory: a review
Appl. Soft Comput.
Interpretations of belief functions in the theory of rough sets
Inf. Sci.
Belief function of Pythagorean fuzzy rough approximation space and its applications
Int. J. Approx. Reason.
Relationships between relation-based rough sets and belief structures
Int. J. Approx. Reason.
Combining nonspecificity measures in Dempster–Shafer theory of evidence
Int. J. Gen. Syst.
Completing a total uncertainty measure in the Dempster-Shafer theory
Int. J. Gen. Syst.
On the prediction loss of the lasso in the partially labeled setting
Electron. J. Stat.
Rough sets in machine learning: a review
Feature reduction in superset learning using rough sets and evidence theory
Learning from partial labels
J. Mach. Learn. Res.
Upper and lower probabilities induced by a multivalued mapping
A k-nearest neighbor classification rule based on Dempster-Shafer theory
IEEE Trans. Syst. Man Cybern.
A k-nearest neighbor classification rule based on Dempster-Shafer theory
Maximum likelihood estimation from uncertain data in the belief function framework
IEEE Trans. Knowl. Data Eng.
Bounds for cell entries in contingency tables given marginal totals and decomposable graphs
Proc. Natl. Acad. Sci. USA
Censored data and the bootstrap
J. Am. Stat. Assoc.
Leveraging latent label distributions for partial label learning
Partial label learning with self-guided retraining
UCI machine learning repository
Conceptual scaling
Cited by (24)
Nature of decision valuations in elimination of redundant attributes
2024, International Journal of Approximate ReasoningPartially-defined equivalence relations: Relationship with orthopartitions and connection to rough sets
2024, Information SciencesSemi-supervised feature selection based on fuzzy related family
2024, Information Sciences