当前位置: X-MOL 学术J. Assoc. Inf. Sci. Technol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Term position‐based language model for information retrieval
Journal of the Association for Information Science and Technology ( IF 2.8 ) Pub Date : 2020-11-20 , DOI: 10.1002/asi.24431
Arezki Hammache 1 , Mohand Boughanem 2
Affiliation  

Term position feature is widely and successfully used in IR and Web search engines, to enhance the retrieval effectiveness. This feature is essentially used for two purposes: to capture query terms proximity or to boost the weight of terms appearing in some parts of a document. In this paper, we are interested in this second category. We propose two novel query‐independent techniques based on absolute term positions in a document, whose goal is to boost the weight of terms appearing in the beginning of a document. The first one considers only the earliest occurrence of a term in a document. The second one takes into account all term positions in a document. We formalize each of these two techniques as a document model based on term position, and then we incorporate it into a basic language model (LM). Two smoothing techniques, Dirichlet and Jelinek‐Mercer, are considered in the basic LM. Experiments conducted on three TREC test collections show that our model, especially the version based on all term positions, achieves significant improvements over the baseline LMs, and it also often performs better than two state‐of‐the‐art baseline models, the chronological term rank model and the Markov random field model.

中文翻译:

基于术语位置的语言模型,用于信息检索

术语位置功能已在IR和Web搜索引擎中广泛且成功地使用,以增强检索效率。此功能主要用于两个目的:捕获查询词的接近度或提高出现在文档某些部分中的词的权重。在本文中,我们对第二类感兴趣。我们提出了两种基于文档中绝对术语位置的独立于查询的新颖技术,其目的是增加出现在文档开头的术语的权重。第一个仅考虑术语在文档中最早出现的情况。第二个考虑了文档中所有术语的位置。我们将这两种技术中的每一种形式化为基于术语位置的文档模型,然后将其合并到基本语言模型(LM)中。两种平滑技术 在基本LM中考虑了Dirichlet和Jelinek-Mercer。在三个TREC测试集合上进行的实验表明,我们的模型(尤其是基于所有术语位置的版本)相对于基线LM有了显着改进,并且其性能通常也优于两个最先进的基线模型(按时间顺序排列的术语)秩模型和马尔可夫随机场模型。
更新日期:2020-11-20
down
wechat
bug