当前位置: X-MOL 学术Inf. Retrieval J. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Overcoming low-utility facets for complex answer retrieval
Information Retrieval Journal ( IF 2.5 ) Pub Date : 2018-10-24 , DOI: 10.1007/s10791-018-9343-0
Sean MacAvaney , Andrew Yates , Arman Cohan , Luca Soldaini , Kai Hui , Nazli Goharian , Ophir Frieder

Many questions cannot be answered simply; their answers must include numerous nuanced details and context. Complex Answer Retrieval (CAR) is the retrieval of answers to such questions. These questions can be constructed from a topic entity (e.g., ‘cheese’) and a facet (e.g., ‘health effects’). While topic matching has been thoroughly explored, we observe that some facets use general language that is unlikely to appear verbatim in answers, exhibiting low utility. In this work, we present an approach to CAR that identifies and addresses low-utility facets. First, we propose two estimators of facet utility: the hierarchical structure of CAR queries, and facet frequency information from training data. Then, to improve the retrieval performance on low-utility headings, we include entity similarity scores using embeddings trained from a CAR knowledge graph, which captures the context of facets. We show that our methods are effective by applying them to two leading neural ranking techniques, and evaluating them on the TREC CAR dataset. We find that our approach perform significantly better than the unmodified neural ranker and other leading CAR techniques, yielding state-of-the-art results. We also provide a detailed analysis of our results, verify that low-utility facets are indeed difficult to match, and that our approach improves the performance for these difficult queries.

中文翻译:

克服低效方面,实现复杂的答案检索

许多问题不能简单地回答。他们的答案必须包括许多细微的细节和背景。复杂答案检索(CAR)是对此类问题的答案的检索。这些问题可以由主题实体(例如“奶酪”)和构面(例如“健康影响”)构成。尽管对主题匹配进行了彻底的探索,但我们发现某些方面使用的通用语言不太可能在答案中逐字出现,因此效用较低。在这项工作中,我们提出了一种识别和解决低效用方面的CAR方法。首先,我们提出了方面效用的两个估计:CAR查询的层次结构,以及来自训练数据的方面频率信息。然后,为了提高低效标题的检索性能,我们使用从CAR知识图训练的嵌入来包含实体相似性评分,该知识捕获了构面的上下文。我们通过将它们应用于两种领先的神经排名技术并在TREC CAR数据集上对其进行评估,证明了我们的方法是有效的。我们发现,我们的方法比未经修改的神经排名器和其他领先的CAR技术表现明显更好,从而产生了最新的结果。我们还提供了对结果的详细分析,验证低实用性方面确实难以匹配,
更新日期:2018-10-24
down
wechat
bug