当前位置: X-MOL 学术Mach. Learn. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Bonsai: diverse and shallow trees for extreme multi-label classification
Machine Learning ( IF 7.5 ) Pub Date : 2020-08-23 , DOI: 10.1007/s10994-020-05888-2
Sujay Khandagale , Han Xiao , Rohit Babbar

Extreme multi-label classification (XMC) refers to supervised multi-label learning involving hundreds of thousands or even millions of labels. In this paper, we develop a suite of algorithms, called Bonsai, which generalizes the notion of label representation in XMC, and partitions the labels in the representation space to learn shallow trees. We show three concrete realizations of this label representation space including: (i) the input space which is spanned by the input features, (ii) the output space spanned by label vectors based on their co-occurrence with other labels, and (iii) the joint space by combining the input and output representations. Furthermore, the constraint-free multi-way partitions learnt iteratively in these spaces lead to shallow trees. By combining the effect of shallow trees and generalized label representation, Bonsai achieves the best of both worlds—fast training which is comparable to state-of-the-art tree-based methods in XMC, and much better prediction accuracy, particularly on tail-labels. On a benchmark Amazon-3M dataset with 3 million labels, Bonsai outperforms a state-of-the-art one-vs-rest method in terms of prediction accuracy, while being approximately 200 times faster to train. The code for Bonsai is available at https://github.com/xmc-aalto/bonsai.

中文翻译:

盆景:用于极端多标签分类的多样化和浅树

极限多标签分类(XMC)是指有监督的多标签学习,涉及数十万甚至数百万个标签。在本文中,我们开发了一套称为 Bonsai 的算法,它概括了 XMC 中标签表示的概念,并在表示空间中划分标签以学习浅树。我们展示了这个标签表示空间的三个具体实现,包括:(i)输入特征所跨越的输入空间,(ii)基于标签向量与其他标签的共现所跨越的输出空间,以及(iii)通过组合输入和输出表示来构建联合空间。此外,在这些空间中迭代学习的无约束多路分区导致浅树。通过结合浅树和广义标签表示的效果,Bonsai 实现了两全其美——与 XMC 中最先进的基于树的方法相媲美的快速训练,以及更好的预测准确性,尤其是在尾标签上。在具有 300 万个标签的基准 Amazon-3M 数据集上,Bonsai 在预测准确性方面优于最先进的 one-vs-rest 方法,同时训练速度快约 200 倍。Bonsai 的代码可在 https://github.com/xmc-aalto/bonsai 获得。
更新日期:2020-08-23
down
wechat
bug