当前位置: X-MOL 学术Data Min. Knowl. Discov. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Active learning for hierarchical multi-label classification
Data Mining and Knowledge Discovery ( IF 2.8 ) Pub Date : 2020-07-17 , DOI: 10.1007/s10618-020-00704-w
Felipe Kenji Nakano , Ricardo Cerri , Celine Vens

Due to technological advances, a massive amount of data is produced daily, presenting challenges for application areas where data needs to be labelled by a domain specialist or by expensive procedures, in order to be useful for supervised machine learning purposes. In order to select which data points will provide more information when labelled, one can make use of active learning methods. Active learning (AL) is a subfield of machine learning which addresses methods to build models with fewer, but more representative instances. Even though AL has been vastly studied, it has not been thoroughly investigated in hierarchical multi-label classification, a learning task where multiple class labels can be assigned to an instance and these labels are hierarchically structured. In this work, we provide a public framework containing baseline and state-of-the-art algorithms suitable for this task. Additionally, we also propose a new algorithm, namely Hierarchical Query-By-Committee (H-QBC), which is validated on datasets from different domains. Our results show that H-QBC is capable of providing superior predictive performance results compared to its competitors, while being computationally efficient and parameter free.

中文翻译:

主动学习用于分层多标签分类

由于技术的进步,每天都会产生大量数据,这给应用领域带来了挑战,在这些应用领域中,必须由领域专家或昂贵的程序来标记数据,以用于有监督的机器学习目的。为了选择标记后哪些数据点将提供更多信息,可以使用主动学习方法。主动学习(AL)是机器学习的一个子领域,它解决了使用较少但更具代表性的实例来构建模型的方法。尽管对AL进行了广泛研究,但在分层多标签分类(一种可以将多个类标签分配给一个实例并且这些标签是分层结构的学习任务)中,尚未对其进行深入研究。在这项工作中 我们提供了一个公共框架,其中包含适用于此任务的基准和最新算法。此外,我们还提出了一种新的算法,即委员会分层查询(H-QBC),该算法已在来自不同域的数据集中进行了验证。我们的结果表明,与竞争对手相比,H-QBC能够提供出色的预测性能结果,同时计算效率高且无参数。
更新日期:2020-07-17
down
wechat
bug