当前位置: X-MOL 学术Data Min. Knowl. Discov. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Word-class embeddings for multiclass text classification
Data Mining and Knowledge Discovery ( IF 4.8 ) Pub Date : 2021-02-19 , DOI: 10.1007/s10618-020-00735-3
Alejandro Moreo , Andrea Esuli , Fabrizio Sebastiani

Pre-trained word embeddings encode general word semantics and lexical regularities of natural language, and have proven useful across many NLP tasks, including word sense disambiguation, machine translation, and sentiment analysis, to name a few. In supervised tasks such as multiclass text classification (the focus of this article) it seems appealing to enhance word representations with ad-hoc embeddings that encode task-specific information. We propose (supervised) word-class embeddings (WCEs), and show that, when concatenated to (unsupervised) pre-trained word embeddings, they substantially facilitate the training of deep-learning models in multiclass classification by topic. We show empirical evidence that WCEs yield a consistent improvement in multiclass classification accuracy, using six popular neural architectures and six widely used and publicly available datasets for multiclass text classification. One further advantage of this method is that it is conceptually simple and straightforward to implement. Our code that implements WCEs is publicly available at https://github.com/AlexMoreo/word-class-embeddings.



中文翻译:

用于多类文本分类的词类嵌入

预训练的单词嵌入对自然语言的一般单词语义和词汇规律进行编码,并且已证明在许多NLP任务中有用,包括单词义消歧,机器翻译和情感分析等。在诸如多类文本分类(本文重点)之类的受监管任务中,通过对任务特定信息进行编码的临时嵌入来增强单词表示似乎很有吸引力。我们提出(监督)词类嵌入(WCE),并显示当与(无监督的)预训练词嵌入连接时,它们极大地促进了按主题在多类分类中训练深度学习模型。我们显示出的经验证据表明,WCE使用六种流行的神经体系结构和六种广泛使用且可公开获得的用于多类文本分类的数据集,在多类分类准确性方面产生了持续改进。该方法的另一个优点是,从概念上讲,它实现起来简单明了。我们实现WCE的代码可从https://github.com/AlexMoreo/word-class-embeddings公开获得。

更新日期:2021-02-19
down
wechat
bug