SimiT: A Text Similarity Method Using Lexicon and Dependency Representations,New Generation Computing

当前位置： X-MOL 学术 › New Gener. Comput. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

SimiT: A Text Similarity Method Using Lexicon and Dependency Representations
New Generation Computing ( IF 2.0 ) Pub Date : 2020-06-17 , DOI: 10.1007/s00354-020-00099-8
Emrah Inan

Semantic textual similarity methods are becoming increasingly crucial in text mining research areas such as text retrieval and summarization. Existing methods of text similarity have often been computed by their shallow or syntactic representation rather than considering their semantic content and meanings. This paper focuses mainly on computing the similarity between sentences without a supervised learning approach, only considering their word-level coherence which is calculated by a hybrid method of dependency parser and lexicon embeddings. Hence, we concentrate on structural similarity between text pairs by regarding their dependency parser embeddings. Our hybrid method also pays attention to the semantic information of words implied in the sentences. In the evaluation, we compare our method with the state-of-the-art semantic similarity measures in a well-known dataset. Our method outperforms most of the studies in the literature and the overall performance achieves better results when combining the similarity scores of both embedding models.

中文翻译：

SimiT：使用词典和依赖表示的文本相似性方法

语义文本相似性方法在文本挖掘研究领域（如文本检索和摘要）中变得越来越重要。现有的文本相似度方法通常是通过它们的浅层或句法表示来计算的，而不是考虑它们的语义内容和含义。本文主要关注在没有监督学习方法的情况下计算句子之间的相似性，仅考虑通过依赖解析器和词典嵌入的混合方法计算的词级连贯性。因此，我们通过考虑它们的依赖解析器嵌入来专注于文本对之间的结构相似性。我们的混合方法还关注句子中隐含的单词的语义信息。在评估中，我们将我们的方法与众所周知的数据集中最先进的语义相似性度量进行比较。我们的方法优于文献中的大多数研究，并且在结合两种嵌入模型的相似性分数时，整体性能获得了更好的结果。

更新日期：2020-06-17

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11