Measuring Text Similarity Based on Structure and Word Embedding,Cognitive Systems Research

当前位置： X-MOL 学术 › Cogn. Syst. Res. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Measuring Text Similarity Based on Structure and Word Embedding
Cognitive Systems Research ( IF 3.9 ) Pub Date : 2020-10-01 , DOI: 10.1016/j.cogsys.2020.04.002
Mamdouh Farouk

Abstract The problem of finding the similarity between natural language sentences is crucial for many applications in Natural Language Processing (NLP). An accurate calculation of similarity between sentences is highly needed. Many approaches depend on word-to-word similarity to measure sentence similarity. This paper proposes a new approach to improve the accuracy of the sentence similarity calculation. The proposed approach combines different similarity measures in the calculation of sentence similarity. In addition to traditional word-to-word similarity measure, the proposed approach exploits sentence semantic structure. Discourse representation structure (DRS) which is a semantic representation for natural sentences is generated and used to calculated structure similarity. Furthermore, word order similarity is measured to consider the order of words in sentences. Experiments show that exploiting structural information achieves good results. Moreover, the proposed method outperforms the current approaches on a standard benchmark dataset achieving 0.8813 Pearson correlation with human similarity.

中文翻译：

基于结构和词嵌入的文本相似度测量

摘要寻找自然语言句子之间的相似性的问题对于自然语言处理 (NLP) 中的许多应用至关重要。非常需要准确计算句子之间的相似度。许多方法依赖于词到词的相似度来衡量句子的相似度。本文提出了一种提高句子相似度计算精度的新方法。所提出的方法在计算句子相似度时结合了不同的相似度度量。除了传统的词到词相似性度量之外，所提出的方法还利用了句子语义结构。生成作为自然句子语义表示的话语表示结构（DRS）并用于计算结构相似度。此外，测量词序相似度以考虑句子中单词的顺序。实验表明，利用结构信息取得了良好的效果。此外，所提出的方法在标准基准数据集上优于当前方法，与人类相似性达到 0.8813 Pearson 相关性。

更新日期：2020-10-01

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>