当前位置: X-MOL 学术arXiv.cs.CL › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
DESCGEN: A Distantly Supervised Datasetfor Generating Abstractive Entity Descriptions
arXiv - CS - Computation and Language Pub Date : 2021-06-09 , DOI: arxiv-2106.05365
Weijia Shi, Mandar Joshi, Luke Zettlemoyer

Short textual descriptions of entities provide summaries of their key attributes and have been shown to be useful sources of background knowledge for tasks such as entity linking and question answering. However, generating entity descriptions, especially for new and long-tail entities, can be challenging since relevant information is often scattered across multiple sources with varied content and style. We introduce DESCGEN: given mentions spread over multiple documents, the goal is to generate an entity summary description. DESCGEN consists of 37K entity descriptions from Wikipedia and Fandom, each paired with nine evidence documents on average. The documents were collected using a combination of entity linking and hyperlinks to the Wikipedia and Fandom entity pages, which together provide high-quality distant supervision. The resulting summaries are more abstractive than those found in existing datasets and provide a better proxy for the challenge of describing new and emerging entities. We also propose a two-stage extract-then-generate baseline and show that there exists a large gap (19.9% in ROUGE-L) between state-of-the-art models and human performance, suggesting that the data will support significant future work.

中文翻译:

DESCGEN:用于生成抽象实体描述的远程监督数据集

实体的简短文本描述提供了其关键属性的摘要,并且已被证明是实体链接和问答等任务的有用背景知识来源。然而,生成实体描述,特别是对于新的和长尾实体,可能具有挑战性,因为相关信息通常分散在具有不同内容和风格的多个来源中。我们引入 DESCGEN:给定分布在多个文档中的提及,目标是生成实体摘要描述。DESCGEN 由来自 Wikipedia 和 Fandom 的 37K 实体描述组成,每个实体描述平均与 9 个证据文档配对。这些文档是使用实体链接和指向维基百科和 Fandom 实体页面的超链接的组合收集的,它们共同提供了高质量的远程监督。由此产生的摘要比现有数据集中的摘要更抽象,并为描述新出现的实体的挑战提供了更好的代理。我们还提出了一个两阶段提取然后生成的基线,并表明在最先进的模型和人类表现之间存在很大的差距(ROUGE-L 为 19.9%),这表明数据将支持重要的未来工作。
更新日期:2021-06-11
down
wechat
bug