Rough set-based feature selection for weakly labeled data,International Journal of Approximate Reasoning

当前位置： X-MOL 学术 › Int. J. Approx. Reason. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Rough set-based feature selection for weakly labeled data
International Journal of Approximate Reasoning ( IF 3.2 ) Pub Date : 2021-06-18 , DOI: 10.1016/j.ijar.2021.06.005
Andrea Campagner , Davide Ciucci , Eyke Hüllermeier

Supervised learning is an important branch of machine learning (ML), which requires a complete annotation (labeling) of the involved training data. This assumption is relaxed in the settings of weakly supervised learning, where labels are allowed to be imprecise or partial. In this article, we study the setting of superset learning, in which instances are assumed to be labeled with a set of possible annotations containing the correct one. We tackle the problem of learning from such data in the context of rough set theory (RST). More specifically, we consider the problem of RST-based feature reduction as a suitable means for data disambiguation, i.e., for the purpose of figuring out the most plausible precise instantiation of the imprecise training data. To this end, we define appropriate generalizations of decision tables and reducts, using tools from generalized information theory and belief function theory. Moreover, we analyze the computational complexity and theoretical properties of the associated computational problems. Finally, we present results of a series of experiments, in which we analyze the proposed concepts empirically and compare our methods with a state-of-the-art dimensionality reduction algorithm, reporting a statistically significant improvement in predictive accuracy.

中文翻译：

弱标记数据的基于粗糙集的特征选择

监督学习是机器学习（ML）的一个重要分支，它需要对所涉及的训练数据进行完整的标注（标注）。在弱监督学习的环境中，这个假设被放宽了，其中允许标签不精确或部分。在本文中，我们研究超集学习的设置，其中假设实例被标记为一组包含正确注释的可能注释。我们在粗糙集理论(RST)的背景下解决从此类数据中学习的问题。更具体地说，我们将基于 RST 的特征减少问题视为数据消歧的合适手段，即，为了找出不精确训练数据的最合理的精确实例。为此，我们使用广义信息论和信念函数理论中的工具来定义决策表和归约的适当概括。此外，我们分析了相关计算问题的计算复杂性和理论特性。最后，我们展示了一系列实验的结果，在这些实验中，我们根据经验分析了所提出的概念，并将我们的方法与最先进的降维算法进行了比较，报告了预测准确性在统计上的显着提高。

更新日期：2021-06-30

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11