ML-MDLText: An efficient and lightweight multilabel text classifier with incremental learning,Applied Soft Computing

当前位置： X-MOL 学术 › Appl. Soft Comput. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

ML-MDLText: An efficient and lightweight multilabel text classifier with incremental learning
Applied Soft Computing ( IF 7.2 ) Pub Date : 2020-09-05 , DOI: 10.1016/j.asoc.2020.106699
Marciele M. Bittencourt , Renato M. Silva , Tiago A. Almeida

Single-label text classification has been extensively studied in the last decades, and usually, more attention has been given to offline learning scenarios, where all of the training data is available in advance. However, real-world text classification problems often involve multilabel instances and have dynamic textual patterns that can change frequently. In this context, the methods must predict a subset of target labels rather than a single one, and ideally should be able to update their model incrementally to be scalable and adaptable to changes in data patterns using limited time and memory. In this study, we present a text classification method based on the minimum description length principle that can be applied to multilabel classification without requiring the transformation of the classification problem. It also takes advantage of dependency information among labels and naturally supports online learning. We evaluated its performance using fifteen datasets from different application domains and compared it with traditional benchmark classifiers, considering three online learning scenarios. Even without requiring problem transformation tricks, the results obtained by the proposed method were very competitive with existing state-of-the-art online learning methods and those that transform multilabel problems into several single-label ones.

中文翻译：

ML-MDLText：具有增量学习功能的高效轻量级多标签文本分类器

在过去的几十年中，对单标签文本分类进行了广泛的研究，通常，脱机学习方案将得到更多的关注，在这种情况下，所有培训数据都可以提前获得。但是，现实世界中的文本分类问题通常涉及多标签实例，并且具有动态文本模式，该模式可能会经常更改。在这种情况下，这些方法必须预测目标标签的子集，而不是单个标签，并且理想情况下，应该能够使用有限的时间和内存以增量方式更新其模型，以实现可伸缩性和适应数据模式的变化。在这项研究中，我们提出了一种基于最小描述长度原则的文本分类方法，该方法可应用于多标签分类，而无需变换分类问题。它还利用标签之间的依赖性信息，并自然支持在线学习。我们使用来自不同应用领域的15个数据集评估了它的性能，并考虑了三种在线学习场景，将其与传统的基准分类器进行了比较。即使不需要问题转换技巧，通过所提出的方法获得的结果也与现有的最新在线学习方法以及将多标签问题转换为多个单标签问题的方法相比，具有很高的竞争力。

更新日期：2020-09-07

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11