WTR: A Test Collection for Web Table Retrieval,arXiv - CS - Information Retrieval

当前位置： X-MOL 学术 › arXiv.cs.IR › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

WTR: A Test Collection for Web Table Retrieval
arXiv - CS - Information Retrieval Pub Date : 2021-05-05 , DOI: arxiv-2105.02354
Zhiyu Chen, Shuo Zhang, Brian D. Davison

We describe the development, characteristics and availability of a test collection for the task of Web table retrieval, which uses a large-scale Web Table Corpora extracted from the Common Crawl. Since a Web table usually has rich context information such as the page title and surrounding paragraphs, we not only provide relevance judgments of query-table pairs, but also the relevance judgments of query-table context pairs with respect to a query, which are ignored by previous test collections. To facilitate future research with this benchmark, we provide details about how the dataset is pre-processed and also baseline results from both traditional and recently proposed table retrieval methods. Our experimental results show that proper usage of context labels can benefit previous table retrieval methods.

中文翻译：

WTR：Web表检索的测试集合

我们描述了用于Web表格检索任务的测试集合的开发，特征和可用性，该测试集合使用从Common Crawl中提取的大规模Web Table语料库。由于Web表通常具有丰富的上下文信息，例如页面标题和周围的段落，因此我们不仅提供查询表对的相关性判断，而且还提供查询表上下文对与某个查询的相关性判断，这些被忽略通过以前的测试集合。为了便于使用此基准进行将来的研究，我们提供了有关如何对数据集进行预处理的详细信息，以及传统和最近提出的表检索方法的基线结果。我们的实验结果表明，正确使用上下文标签可以使以前的表检索方法受益。

更新日期：2021-05-07

点击分享查看原文

点击收藏

阅读更多本刊最新论文