当前位置: X-MOL 学术Comput. Speech Lang › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
tax2vec: Constructing Interpretable Features from Taxonomies for Short Text Classification
Computer Speech & Language ( IF 3.1 ) Pub Date : 2020-04-29 , DOI: 10.1016/j.csl.2020.101104
Blaž Škrlj , Matej Martinc , Jan Kralj , Nada Lavrač , Senja Pollak

The use of background knowledge is largely unexploited in text classification tasks. This paper explores word taxonomies as means for constructing new semantic features, which may improve the performance and robustness of the learned classifiers. We propose tax2vec, a parallel algorithm for constructing taxonomy-based features, and demonstrate its use on six short text classification problems: prediction of gender, personality type, age, news topics, drug side effects and drug effectiveness. The constructed semantic features, in combination with fast linear classifiers, tested against strong baselines such as hierarchical attention neural networks, achieves comparable classification results on short text documents. The algorithm’s performance is also tested in a few-shot learning setting, indicating that the inclusion of semantic features can improve the performance in data-scarce situations. The tax2vec capability to extract corpus-specific semantic keywords is also demonstrated. Finally, we investigate the semantic space of potential features, where we observe a similarity with the well known Zipf’s law.



中文翻译:

tax2vec:从分类法构建用于短文本分类的可解释特征

背景知识的使用在文本分类任务中未得到充分利用。本文探讨了单词分类法作为构建新语义特征的手段,这可以提高学习的分类器的性能和鲁棒性。我们提出tax2vec,这是一种用于构建基于分类的特征的并行算法,并演示了其在六个短文本分类问题上的使用:性别,个性类型,年龄,新闻主题,药物副作用和药物有效性的预测。构造的语义特征与快速线性分类器相结合,针对诸如基线注意力神经网络之类的强基准进行了测试,可在短文本文档上实现可比的分类结果。在几次学习设置中也测试了算法的性能,表明包含语义特征可以提高数据稀缺情况下的性能。还演示了tax2vec提取语料库特定语义关键字的功能。最后,我们研究了潜在特征的语义空间,并观察到了与众所周知的Zipf定律的相似性。

更新日期:2020-04-29
down
wechat
bug