DBTagger: Multi-Task Learning for Keyword Mapping in NLIDBs Using Bi-Directional Recurrent Neural Networks,arXiv - CS - Databases

当前位置： X-MOL 学术 › arXiv.cs.DB › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

DBTagger: Multi-Task Learning for Keyword Mapping in NLIDBs Using Bi-Directional Recurrent Neural Networks
arXiv - CS - Databases Pub Date : 2021-01-11 , DOI: arxiv-2101.04226
Arif Usta, Akifhan Karakayali, Özgür Ulusoy

Translating Natural Language Queries (NLQs) to Structured Query Language (SQL) in interfaces deployed in relational databases is a challenging task, which has been widely studied in database community recently. Conventional rule based systems utilize series of solutions as a pipeline to deal with each step of this task, namely stop word filtering, tokenization, stemming/lemmatization, parsing, tagging, and translation. Recent works have mostly focused on the translation step overlooking the earlier steps by using ad-hoc solutions. In the pipeline, one of the most critical and challenging problems is keyword mapping; constructing a mapping between tokens in the query and relational database elements (tables, attributes, values, etc.). We define the keyword mapping problem as a sequence tagging problem, and propose a novel deep learning based supervised approach that utilizes POS tags of NLQs. Our proposed approach, called \textit{DBTagger} (DataBase Tagger), is an end-to-end and schema independent solution, which makes it practical for various relational databases. We evaluate our approach on eight different datasets, and report new state-of-the-art accuracy results, $92.4\%$ on the average. Our results also indicate that DBTagger is faster than its counterparts up to $10000$ times and scalable for bigger databases.

中文翻译：

DBTagger：使用双向递归神经网络在NLIDB中进行关键字映射的多任务学习

在关系数据库中部署的接口中将自然语言查询（NLQs）转换为结构化查询语言（SQL）是一项艰巨的任务，最近在数据库社区中对此进行了广泛的研究。基于常规规则的系统利用一系列解决方案作为处理此任务每个步骤的管道，即停止词过滤，标记化，词干/词形化，解析，标记和翻译。最近的工作主要集中在使用临时解决方案的翻译步骤，而忽略了先前的步骤。在管道中，最关键和最具挑战性的问题之一是关键字映射。在查询中的令牌和关系数据库元素（表，属性，值等）之间构造映射。我们将关键字映射问题定义为序列标记问题，并提出了一种新颖的基于深度学习的监督方法，该方法利用了NLQ的POS标签。我们提出的方法称为\ textit {DBTagger}（Database Tagger），是一种端对端且独立于模式的解决方案，对于各种关系数据库都非常实用。我们在八个不同的数据集上评估了我们的方法，并报告了最新的准确性结果，平均为$ 92.4 \％$。我们的结果还表明，DBTagger比同类产品快10,000美元，并且可扩展到更大的数据库。平均4％。我们的结果还表明，DBTagger比同类产品快10,000美元，并且可扩展到更大的数据库。平均4％。我们的结果还表明，DBTagger比同类产品快10,000美元，并且可扩展到更大的数据库。

更新日期：2021-01-13

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>