Paraphrase thought: Sentence embedding module imitating human language recognition,Information Sciences

当前位置： X-MOL 学术 › Inform. Sci. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Paraphrase thought: Sentence embedding module imitating human language recognition
Information Sciences Pub Date : 2020-07-02 , DOI: 10.1016/j.ins.2020.05.129
Myeongjun Jang , Pilsung Kang

Sentence embedding is an important research topic in natural language processing. It is essential to generate a good embedding vector that fully reflects the semantic meaning of a sentence in order to achieve an enhanced performance for various natural language processing tasks, such as machine translation and document classification. Thus far, various sentence embedding models have been proposed, and their feasibility has been demonstrated through good performances on tasks following embedding, such as sentiment analysis and sentence classification. However, because the performances of sentence classification and sentiment analysis can be enhanced by using a simple sentence representation method, it is not sufficient to claim that these models fully reflect the meanings of sentences based on good performances for such tasks. In this paper, inspired by human language recognition, we propose the following concept of semantic coherence, which should be satisfied for a good sentence embedding method: similar sentences should be located close to each other in the embedding space. Then, we propose the Paraphrase-Thought (P-thought) model to pursue semantic coherence as much as possible. Experimental results on three paraphrase identification datasets (MS COCO, STS benchmark, SICK) show that the P-thought models outperform the benchmarked sentence embedding methods.

中文翻译：

释义思想：模仿人类语言识别的句子嵌入模块

句子嵌入是自然语言处理中的重要研究课题。至关重要的是，要生成一个能够充分反映句子语义的良好嵌入向量，以提高各种自然语言处理任务（例如机器翻译和文档分类）的性能。到目前为止，已经提出了各种句子嵌入模型，并且通过在诸如情感分析和句子分类之类的嵌入之后的任务上的良好性能证明了它们的可行性。但是，由于可以通过使用简单的句子表示方法来增强句子分类和情感分析的性能，因此仅凭这些任务的良好性能而声称这些模型不能完全反映句子的含义是不够的。在本文中，受人类语言识别的启发，我们提出以下语义一致性的概念，这对于一种良好的句子嵌入方法应该满足：相似的句子在嵌入空间中应彼此靠近。然后，我们提出了释义思想（P-Thought）模型，以尽可能地追求语义一致性。在三个释义识别数据集（MS COCO，STS基准，SICK）上的实验结果表明，P思维模型优于基准句子嵌入方法。我们提出了释义思想（P-Thought）模型，以尽可能地追求语义连贯性。在三个释义识别数据集（MS COCO，STS基准，SICK）上的实验结果表明，P思维模型优于基准句子嵌入方法。我们提出了释义思想（P-Thought）模型，以尽可能地追求语义连贯性。在三个释义识别数据集（MS COCO，STS基准，SICK）上的实验结果表明，P思维模型优于基准句子嵌入方法。

更新日期：2020-07-02

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11