Overview of the TREC 2019 deep learning track,arXiv - CS - Information Retrieval

当前位置： X-MOL 学术 › arXiv.cs.IR › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Overview of the TREC 2019 deep learning track
arXiv - CS - Information Retrieval Pub Date : 2020-03-17 , DOI: arxiv-2003.07820
Nick Craswell, Bhaskar Mitra, Emine Yilmaz, Daniel Campos, Ellen M. Voorhees

The Deep Learning Track is a new track for TREC 2019, with the goal of studying ad hoc ranking in a large data regime. It is the first track with large human-labeled training sets, introducing two sets corresponding to two tasks, each with rigorous TREC-style blind evaluation and reusable test sets. The document retrieval task has a corpus of 3.2 million documents with 367 thousand training queries, for which we generate a reusable test set of 43 queries. The passage retrieval task has a corpus of 8.8 million passages with 503 thousand training queries, for which we generate a reusable test set of 43 queries. This year 15 groups submitted a total of 75 runs, using various combinations of deep learning, transfer learning and traditional IR ranking methods. Deep learning runs significantly outperformed traditional IR runs. Possible explanations for this result are that we introduced large training data and we included deep models trained on such data in our judging pools, whereas some past studies did not have such training data or pooling.

中文翻译：

TREC 2019 深度学习赛道概览

Deep Learning Track 是 TREC 2019 的新赛道，其目标是研究大数据体系中的临时排名。它是第一个拥有大型人工标记训练集的赛道，引入了两个对应两个任务的集，每个集都有严格的 TREC 式盲评估和可重用的测试集。文档检索任务的语料库包含 320 万个文档和 36.7 万个训练查询，我们为此生成了一个包含 43 个查询的可重用测试集。文章检索任务有 880 万篇文章的语料库和 50.3 万个训练查询，为此我们生成了一个包含 43 个查询的可重用测试集。今年 15 个小组使用深度学习、迁移学习和传统 IR 排名方法的各种组合，总共提交了 75 次运行。深度学习运行明显优于传统的 IR 运行。

更新日期：2020-03-19

点击分享查看原文

点击收藏

阅读更多本刊最新论文