Decomposing word embedding with the capsule network,Knowledge-Based Systems

当前位置： X-MOL 学术 › Knowl. Based Syst. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Decomposing word embedding with the capsule network
Knowledge-Based Systems ( IF 8.8 ) Pub Date : 2020-11-27 , DOI: 10.1016/j.knosys.2020.106611
Xin Liu , Qingcai Chen , Yan Liu , Joanna Siebert , Baotian Hu , Xiangping Wu , Buzhou Tang

Word sense disambiguation tries to learn the appropriate sense of an ambiguous word in a given context. The existing pre-trained language methods and the methods based on multi-embeddings of word did not explore the power of the unsupervised word embedding sufficiently.

In this paper, we discuss a capsule network-based approach, taking advantage of capsule’s potential for recognizing highly overlapping features and dealing with segmentation. We propose a capsule network-based method to decompose the unsupervised word embedding of an ambiguous word into context specific sense embedding, called CapsDecE2S. In this approach, the unsupervised ambiguous embedding is fed into capsule network to produce its multiple morpheme-like vectors, which are defined as the basic semantic language units of meaning. With attention operations, CapsDecE2S integrates the word context to reconstruct the multiple morpheme-like vectors into the context-specific sense embedding. To train CapsDecE2S, we propose a sense matching training method. In this method, we convert the sense learning into a binary classification that explicitly learns the relation between senses by the label of matching and non-matching. The CapsDecE2S was experimentally evaluated on two sense learning tasks, i.e., word in context and word sense disambiguation. Results on two public corpora Word-in-Context and English all-words Word Sense Disambiguation show that, the CapsDecE2S model achieves the new state-of-the-art for the word in context and word sense disambiguation tasks. The source code can be downloaded from the Github page¹ .

中文翻译：

用胶囊网络分解单词嵌入

词义歧义试图在给定的上下文中学习对歧义词的适当意义。现有的预训练语言方法和基于词的多嵌入的方法没有充分地探索无监督词嵌入的能力。

在本文中，我们讨论了一种基于胶囊网络的方法，利用胶囊的潜力来识别高度重叠的特征并处理分割。我们提出了一种基于胶囊网络的方法，将模糊词的无监督词嵌入分解为上下文特定的意义嵌入，称为CapsDecE2S。在这种方法中，无监督的模糊嵌入被馈送到胶囊网络中，以产生其多个类似词素的向量，这些向量被定义为意义的基本语义语言单元。通过注意力操作，CapsDecE2S将单词上下文集成在一起，以将多个类似词素的向量重构到上下文特定的意义嵌入中。为了训练CapsDecE2S，我们提出了一种感觉匹配训练方法。用这种方法我们将感觉学习转换为二元分类，通过匹配和不匹配的标签显式地学习感觉之间的关系。CapsDecE2S在两个意义上的学习任务上进行了实验评估，即上下文中的单词和单词意义的歧义消除。两种公共语料库“语境”和英语全词“词义消除歧义”的结果表明，CapsDecE2S模型为上下文中的词和词义消除歧义任务实现了最新的最新技术。可以从Github页面下载源代码 CapsDecE2S模型在上下文和词义消除歧义任务中实现了该词的最新技术。可以从Github页面下载源代码 CapsDecE2S模型在上下文和词义消除歧义任务中实现了该词的最新技术。可以从Github页面下载源代码¹。

更新日期：2020-12-02

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>