当前位置: X-MOL 学术ACM Trans. Inf. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Towards Question-based High-recall Information Retrieval
ACM Transactions on Information Systems ( IF 5.6 ) Pub Date : 2020-05-22 , DOI: 10.1145/3388640
Jie Zou 1 , Evangelos Kanoulas 1
Affiliation  

While continuous active learning algorithms have proven effective in finding most of the relevant documents in a collection, the cost for locating the last few remains high for applications such as Technology-assisted Reviews (TAR). To locate these last few but significant documents efficiently, Zou et al. [2018] have proposed a novel interactive algorithm. The algorithm is based on constructing questions about the presence or absence of entities in the missing relevant documents. The hypothesis made is that entities play a central role in documents carrying key information and that the users are able to answer questions about the presence or absence of an entity in the missing relevance documents. Based on this, a Sequential Bayesian Search-based approach that selects the optimal sequence of questions to ask was devised. In this work, we extend Zou et al. [2018] by (a) investigating the noise tolerance of the proposed algorithm; (b) proposing an alternative objective function to optimize, which accounts for user “erroneous” answers; (c) proposing a method that sequentially decides the best point to stop asking questions to the user; and (d) conducting a small user study to validate some of the assumptions made by Zou et al. [2018]. Furthermore, all experiments are extended to demonstrate the effectiveness of the proposed algorithms not only in the phase of abstract appraisal (i.e., finding the abstracts of potentially relevant documents in a collection) but also finding the documents to be included in the review (i.e., finding the subset of those relevant abstracts for which the article remains relevant). The experimental results demonstrate that the proposed algorithms can greatly improve performance, requiring reviewing fewer irrelevant documents to find the last relevant ones compared to state-of-the-art methods, even in the case of noisy answers. Further, they show that our algorithm learns to stop asking questions at the right time. Last, we conduct a small user study involving an expert reviewer. The user study validates some of the assumptions made in this work regarding the user’s willingness to answer the system questions and the extent of it, as well as the ability of the user to answer these questions.

中文翻译:

迈向基于问题的高召回信息检索

虽然连续主动学习算法已被证明在查找集合中的大多数相关文档方面是有效的,但对于诸如技术辅助评论 (TAR) 等应用程序而言,定位最后几个文档的成本仍然很高。为了有效地定位这些最后几个但重要的文件,Zou 等人。[2018] 提出了一种新颖的交互算法。该算法基于构建有关缺失的相关文档中是否存在实体的问题。所做的假设是实体在携带关键信息的文档中发挥核心作用,并且用户能够回答有关缺失的相关文档中是否存在实体的问题。基于此,设计了一种基于序列贝叶斯搜索的方法,该方法可以选择要问的最佳问题序列。在这项工作中,我们扩展了 Zou 等人。[2018] 通过 (a) 研究所提出算法的噪声容限;(b) 提出一个替代目标函数来优化,它解释了用户“错误”的答案;(c) 提出一种依次确定停止向用户提问的最佳点的方法;(d) 进行一项小型用户研究,以验证 Zou 等人所做的一些假设。[2018]。此外,所有实验都被扩展以证明所提出算法的有效性,不仅在摘要评估阶段(即,在集合中查找潜在相关文档的摘要),而且在查找要包含在审查中的文档(即,找到文章仍然相关的那些相关摘要的子集)。实验结果表明,与最先进的方法相比,即使在嘈杂的答案的情况下,所提出的算法也可以大大提高性能,需要审查更少的不相关文档以找到最后相关的文档。此外,它们表明我们的算法学会了在正确的时间停止提问。最后,我们进行了一项涉及专家评审的小型用户研究。用户研究验证了这项工作中关于用户回答系统问题的意愿及其程度以及用户回答这些问题的能力的一些假设。他们表明我们的算法学会了在正确的时间停止提问。最后,我们进行了一项涉及专家评审的小型用户研究。用户研究验证了这项工作中关于用户回答系统问题的意愿及其程度以及用户回答这些问题的能力的一些假设。他们表明我们的算法学会了在正确的时间停止提问。最后,我们进行了一项涉及专家评审的小型用户研究。用户研究验证了这项工作中关于用户回答系统问题的意愿及其程度以及用户回答这些问题的能力的一些假设。
更新日期:2020-05-22
down
wechat
bug