A New Strategy to Seed Selection for the High Recall Task,IEEE Latin America Transactions

当前位置： X-MOL 学术 › IEEE Lat. Am. Trans. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A New Strategy to Seed Selection for the High Recall Task
IEEE Latin America Transactions ( IF 1.3 ) Pub Date : 2021-07-12 , DOI: 10.1109/tla.2021.9480153
Matheus Vinícius Todescato ₁ , Jean Hilger ₂ , Guilherme Dal Bianco ₁

Affiliation

High Recall Information Retrieval (HIRE) aims at identifying all (or nearly all) relevant documents given a query. HIRE, for example, is used in the systematic literature review task, where the goal is to identify all relevant scientific articles. The documents selected by HIRE as relevant define the user effort to identify the target information. On this way, one of HIRE goals is only to return relevant documents avoiding overburning the user with non-relevant information. Traditionally, supervised machine learning algorithms are used as HIRE' core to produce a ranking of relevant documents (e.g. SVM). However, such algorithms depend on an initial training set (seed) to start the process of learning. In this work, we propose a new approach to produce the initial seed for HIRE focus on reducing the user effort. Our approach combines an active learning approach with a raking strategy to select only the informative examples. The experimentation shows that our approach is able to reduce until 18% the labeling effort with competitive recall.

中文翻译：

高召回率任务的种子选择新策略

高召回率信息检索 (HIRE) 旨在识别给定查询的所有（或几乎所有）相关文档。例如，HIRE 用于系统性文献审查任务，其目标是识别所有相关的科学文章。HIRE 选择的相关文档定义了用户识别目标信息的努力。通过这种方式，HIRE 的目标之一只是返回相关文档，避免用不相关的信息过度烧毁用户。传统上，监督机器学习算法被用作 HIRE 的核心来生成相关文档的排名（例如 SVM）。但是，此类算法依赖于初始训练集（种子）来启动学习过程。在这项工作中，我们提出了一种新方法来产生 HIRE 的初始种子，重点是减少用户的工作量。我们的方法将主动学习方法与倾斜策略相结合，以仅选择信息丰富的示例。实验表明，我们的方法能够通过竞争召回将标记工作减少到 18%。

更新日期：2021-07-13

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>