TURL: Table Understanding through Representation Learning,arXiv - CS - Computation and Language

当前位置： X-MOL 学术 › arXiv.cs.CL › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

TURL: Table Understanding through Representation Learning
arXiv - CS - Computation and Language Pub Date : 2020-06-26 , DOI: arxiv-2006.14806
Xiang Deng, Huan Sun, Alyssa Lees, You Wu, Cong Yu

Relational tables on the Web store a vast amount of knowledge. Owing to the wealth of such tables, there has been tremendous progress on a variety of tasks in the area of table understanding. However, existing work generally relies on heavily-engineered task specific features and model architectures. In this paper, we present TURL, a novel framework that introduces the pre-training/finetuning paradigm to relational Web tables. During pre-training, our framework learns deep contextualized representations on relational tables in an unsupervised manner. Its universal model design with pre-trained representations can be applied to a wide range of tasks with minimal task-specific fine-tuning. Specifically, we propose a structure-aware Transformer encoder to model the row-column structure of relational tables, and present a new Masked Entity Recovery (MER) objective for pre-training to capture the semantics and knowledge in large-scale unlabeled data. We systematically evaluate TURL with a benchmark consisting of 6 different tasks for table understanding (e.g., relation extraction, cell filling). We show that TURL generalizes well to all tasks and substantially outperforms existing methods in almost all instances.

中文翻译：

TURL：通过表示学习理解表格

Web 上的关系表存储了大量的知识。由于此类表格的丰富性，表格理解领域的各种任务取得了巨大进展。然而，现有的工作通常依赖于精心设计的任务特定功能和模型架构。在本文中，我们介绍了 TURL，这是一种将预训练/微调范式引入关系 Web 表的新颖框架。在预训练期间，我们的框架以无监督的方式学习关系表上的深层上下文表示。其具有预训练表示的通用模型设计可以通过最少的特定于任务的微调应用于广泛的任务。具体来说，我们提出了一个结构感知的 Transformer 编码器来对关系表的行列结构进行建模，并提出了一个新的掩码实体恢复 (MER) 目标，用于预训练以捕获大规模未标记数据中的语义和知识。我们使用由 6 个不同任务组成的基准系统地评估 TURL，用于表格理解（例如，关系提取、单元格填充）。我们表明，TURL 可以很好地泛化到所有任务，并且在几乎所有情况下都大大优于现有方法。

更新日期：2020-06-29

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>