Learning Unsupervised Knowledge-Enhanced Representations to Reduce the Semantic Gap in Information Retrieval,ACM Transactions on Information Systems

当前位置： X-MOL 学术 › ACM Trans. Inf. Syst. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Learning Unsupervised Knowledge-Enhanced Representations to Reduce the Semantic Gap in Information Retrieval
ACM Transactions on Information Systems ( IF 5.6 ) Pub Date : 2020-09-12 , DOI: 10.1145/3417996
Maristella Agosti ₁ , Stefano Marchesin ₁ , Gianmaria Silvello ₁

Affiliation

The semantic mismatch between query and document terms—i.e., the semantic gap—is a long-standing problem in Information Retrieval (IR). Two main linguistic features related to the semantic gap that can be exploited to improve retrieval are synonymy and polysemy. Recent works integrate knowledge from curated external resources into the learning process of neural language models to reduce the effect of the semantic gap. However, these knowledge-enhanced language models have been used in IR mostly for re-ranking and not directly for document retrieval. We propose the Semantic-Aware Neural Framework for IR (SAFIR), an unsupervised knowledge-enhanced neural framework explicitly tailored for IR. SAFIR jointly learns word, concept, and document representations from scratch. The learned representations encode both polysemy and synonymy to address the semantic gap. SAFIR can be employed in any domain where external knowledge resources are available. We investigate its application in the medical domain where the semantic gap is prominent and there are many specialized and manually curated knowledge resources. The evaluation on shared test collections for medical literature retrieval shows the effectiveness of SAFIR in terms of retrieving and ranking relevant documents most affected by the semantic gap.

中文翻译：

学习无监督知识增强表示以减少信息检索中的语义差距

查询和文档术语之间的语义不匹配——即语义差距——是信息检索 (IR) 中长期存在的问题。与语义差距相关的两个主要语言特征可用于改进检索，即同义词和多义词。最近的工作将来自策划的外部资源的知识整合到神经语言模型的学习过程中，以减少语义差距的影响。然而，这些知识增强的语言模型在 IR 中主要用于重新排序，而不是直接用于文档检索。我们提出了用于 IR 的语义感知神经框架 (SAFIR)，这是一种专门为 IR 量身定制的无监督知识增强神经框架。SAFIR 从头开始共同学习单词、概念和文档表示。学习到的表示对多义词和同义词进行编码以解决语义差距。SAFIR 可用于任何有外部知识资源可用的领域。我们研究了它在语义差距突出并且有许多专业和人工管理的知识资源的医学领域的应用。对医学文献检索共享测试集的评估显示了 SAFIR 在检索和排序受语义差距影响最大的相关文档方面的有效性。

更新日期：2020-09-12

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>