Schema2QA: High-Quality and Low-Cost Q&A Agents for the Structured Web,arXiv - CS - Computation and Language

当前位置： X-MOL 学术 › arXiv.cs.CL › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Schema2QA: High-Quality and Low-Cost Q&A Agents for the Structured Web
arXiv - CS - Computation and Language Pub Date : 2020-01-16 , DOI: arxiv-2001.05609
Silei Xu, Giovanni Campagna, Jian Li and Monica S. Lam

Building a question-answering agent currently requires large annotated datasets, which are prohibitively expensive. This paper proposes Schema2QA, an open-source toolkit that can generate a Q&A system from a database schema augmented with a few annotations for each field. The key concept is to cover the space of possible compound queries on the database with a large number of in-domain questions synthesized with the help of a corpus of generic query templates. The synthesized data and a small paraphrase set are used to train a novel neural network based on the BERT pretrained model. We use Schema2QA to generate Q&A systems for five Schema.org domains, restaurants, people, movies, books and music, and obtain an overall accuracy between 64% and 75% on crowdsourced questions for these domains. Once annotations and paraphrases are obtained for a Schema.org schema, no additional manual effort is needed to create a Q&A agent for any website that uses the same schema. Furthermore, we demonstrate that learning can be transferred from the restaurant to the hotel domain, obtaining a 64% accuracy on crowdsourced questions with no manual effort. Schema2QA achieves an accuracy of 60% on popular restaurant questions that can be answered using Schema.org. Its performance is comparable to Google Assistant, 7% lower than Siri, and 15% higher than Alexa. It outperforms all these assistants by at least 18% on more complex, long-tail questions.

中文翻译：

Schema2QA：结构化网络的高质量和低成本问答代理

构建问答代理目前需要大量带注释的数据集，这些数据集成本高得令人望而却步。本文提出了 Schema2QA，这是一种开源工具包，可以从数据库模式生成问答系统，并为每个字段增加一些注释。关键概念是在通用查询模板的语料库的帮助下，用大量域内问题来覆盖数据库上可能的复合查询的空间。合成数据和一个小的释义集用于训练基于 BERT 预训练模型的新型神经网络。我们使用 Schema2QA 为 Schema.org 的五个域、餐馆、人物、电影、书籍和音乐生成问答系统，并在这些域的众包问题上获得了 64% 到 75% 的总体准确率。一旦获得 Schema.org 模式的注释和释义，就不需要额外的手动工作来为使用相同模式的任何网站创建问答代理。此外，我们证明了学习可以从餐厅转移到酒店领域，无需人工操作即可在众包问题上获得 64% 的准确率。Schema2QA 在可以使用 Schema.org 回答的流行餐厅问题上达到了 60% 的准确率。其性能堪比谷歌助手，比Siri低7%，比Alexa高15%。在更复杂的长尾问题上，它比所有这些助手至少高出 18%。我们证明了学习可以从餐厅转移到酒店领域，无需人工操作即可在众包问题上获得 64% 的准确率。Schema2QA 在可以使用 Schema.org 回答的流行餐厅问题上达到了 60% 的准确率。其性能堪比谷歌助手，比Siri低7%，比Alexa高15%。在更复杂的长尾问题上，它比所有这些助手至少高出 18%。我们证明了学习可以从餐厅转移到酒店领域，无需人工操作即可在众包问题上获得 64% 的准确率。Schema2QA 在可以使用 Schema.org 回答的流行餐厅问题上达到了 60% 的准确率。其性能堪比谷歌助手，比Siri低7%，比Alexa高15%。在更复杂的长尾问题上，它比所有这些助手至少高出 18%。

更新日期：2020-08-26

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>