当前位置: X-MOL 学术IEEE Trans. Knowl. Data. Eng. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Learning to Weight for Text Classification
IEEE Transactions on Knowledge and Data Engineering ( IF 8.9 ) Pub Date : 2020-02-01 , DOI: 10.1109/tkde.2018.2883446
Alejandro Moreo , Andrea Esuli , Fabrizio Sebastiani

In information retrieval (IR) and related tasks, term weighting approaches typically consider the frequency of the term in the document and in the collection in order to compute a score reflecting the importance of the term for the document. In tasks characterized by the presence of training data (such as text classification) it seems logical that the term weighting function should take into account the distribution (as estimated from training data) of the term across the classes of interest. Although “supervised term weighting” approaches that use this intuition have been described before, they have failed to show consistent improvements. In this article, we analyze the possible reasons for this failure, and call consolidated assumptions into question. Following this criticism, we propose a novel supervised term weighting approach that, instead of relying on any predefined formula, learns a term weighting function optimized on the training set of interest; we dub this approach Learning to Weight (LTW). The experiments that we run on several well-known benchmarks, and using different learning methods, show that our method outperforms previous term weighting approaches in text classification.

中文翻译:

学习加权文本分类

在信息检索 (IR) 和相关任务中,术语加权方法通常会考虑术语在文档和集合中的频率,以计算反映术语对文档重要性的分数。在以存在训练数据为特征的任务(例如文本分类)中,术语加权函数应该考虑术语在感兴趣的类别中的分布(根据训练数据估计)似乎是合乎逻辑的。尽管之前已经描述了使用这种直觉的“监督项加权”方法,但它们未能显示出一致的改进。在本文中,我们分析了这种失败的可能原因,并对综合假设提出质疑。遵循这一批评,我们提出了一种新颖的监督术语加权方法,不依赖于任何预定义的公式,而是学习在感兴趣的训练集上优化的术语权重函数;我们将这种方法称为学习权重 (LTW)。我们在几个著名的基准测试上运行的实验,并使用不同的学习方法,表明我们的方法在文本分类中优于以前的术语加权方法。
更新日期:2020-02-01
down
wechat
bug