当前位置: X-MOL 学术arXiv.cs.NE › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Text Classification and Clustering with Annealing Soft Nearest Neighbor Loss
arXiv - CS - Neural and Evolutionary Computing Pub Date : 2021-07-23 , DOI: arxiv-2107.14597
Abien Fred Agarap

We define disentanglement as how far class-different data points from each other are, relative to the distances among class-similar data points. When maximizing disentanglement during representation learning, we obtain a transformed feature representation where the class memberships of the data points are preserved. If the class memberships of the data points are preserved, we would have a feature representation space in which a nearest neighbour classifier or a clustering algorithm would perform well. We take advantage of this method to learn better natural language representation, and employ it on text classification and text clustering tasks. Through disentanglement, we obtain text representations with better-defined clusters and improve text classification performance. Our approach had a test classification accuracy of as high as 90.11% and test clustering accuracy of 88% on the AG News dataset, outperforming our baseline models -- without any other training tricks or regularization.

中文翻译:

带有退火软最近邻损失的文本分类和聚类

我们将解开定义为不同类的数据点彼此之间的距离,相对于类相似数据点之间的距离。在表示学习期间最大化解缠结时,我们获得了一个转换后的特征表示,其中保留了数据点的类成员资格。如果保留数据点的类成员资格,我们将拥有一个特征表示空间,其中最近邻分类器或聚类算法将表现良好。我们利用这种方法来学习更好的自然语言表示,并将其用于文本分类和文本聚类任务。通过解开,我们获得了具有更好定义集群的文本表示,并提高了文本分类性能。我们的方法的测试分类准确率高达 90。
更新日期:2021-08-02
down
wechat
bug