当前位置: X-MOL 学术J. Web Semant. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Less is more: Data-efficient complex question answering over knowledge bases
Journal of Web Semantics ( IF 2.1 ) Pub Date : 2020-10-16 , DOI: 10.1016/j.websem.2020.100612
Yuncheng Hua , Yuan-Fang Li , Guilin Qi , Wei Wu , Jingyao Zhang , Daiqing Qi

Question answering is an effective method for obtaining information from knowledge bases (KB). In this paper, we propose the Neural-Symbolic Complex Question Answering (NS-CQA) model, a data-efficient reinforcement learning framework for complex question answering by using only a modest number of training samples. Our framework consists of a neural generator and a symbolic executor that, respectively, transforms a natural-language question into a sequence of primitive actions, and executes them over the knowledge base to compute the answer. We carefully formulate a set of primitive symbolic actions that allows us to not only simplify our neural network design but also accelerate model convergence. To reduce search space, we employ the copy and masking mechanisms in our encoder–decoder architecture to drastically reduce the decoder output vocabulary and improve model generalizability. We equip our model with a memory buffer that stores high-reward promising programs. Besides, we propose an adaptive reward function. By comparing the generated trial with the trials stored in the memory buffer, we derive the curriculum-guided reward bonus, i.e., the proximity and the novelty. To mitigate the sparse reward problem, we combine the adaptive reward and the reward bonus, reshaping the sparse reward into dense feedback. Also, we encourage the model to generate new trials to avoid imitating the spurious trials while making the model remember the past high-reward trials to improve data efficiency. Our NS-CQA model is evaluated on two datasets: CQA, a recent large-scale complex question answering dataset, and WebQuestionsSP, a multi-hop question answering dataset. On both datasets, our model outperforms the state-of-the-art models. Notably, on CQA, NS-CQA performs well on questions with higher complexity, while only using approximately 1% of the total training samples.



中文翻译:

少即是多:数据有效,复杂的知识库问答

问答是一种从知识库(KB)获取信息的有效方法。在本文中,我们提出了神经符号复杂问答系统(NS-CQA)模型,这是一种仅使用少量训练样本即可有效解决复杂问题的高效数据学习框架。我们的框架由神经生成器和符号执行器组成分别将自然语言问题转换为一系列原始动作,并在知识库中执行它们以计算答案。我们仔细制定了一组原始的符号动作,这些动作不仅使我们可以简化神经网络设计,而且可以加快模型收敛。为了减少搜索空间,我们在编码器-解码器体系结构中采用了复制和屏蔽机制来大大减少解码器输出的词汇量并提高模型的通用性。我们为模型配备了一个内存缓冲区,用于存储高回报的有前途的程序。此外,我们提出了一种自适应奖励函数。通过将生成的试验与存储在内存缓冲区中的试验进行比较,我们得出了课程指导的奖励奖金,即接近性和新颖性。为了减轻稀疏奖励的问题,我们将自适应奖励和奖励奖金结合起来,将稀疏奖励重塑为密集的反馈。此外,我们鼓励模型生成新的试验,以避免模仿假试验,同时使模型记住过去的高奖励试验,以提高数据效率。我们的NS-CQA模型在两个数据集上进行了评估:CQA(一个最近的大规模复杂问题回答数据集)和WebQuestionsSP(一个多跳问题回答数据集)。在这两个数据集上,我们的模型都优于最新模型。值得注意的是,在CQA上,NS-CQA在复杂性更高的问题上表现良好,而仅使用了总数约1%的训练样本。我们鼓励模型生成新的试验,以避免模仿假试验,同时使模型记住过去的高奖励试验,以提高数据效率。我们的NS-CQA模型在两个数据集上进行了评估:CQA(一个最近的大规模复杂问题回答数据集)和WebQuestionsSP(一个多跳问题回答数据集)。在这两个数据集上,我们的模型都优于最新模型。值得注意的是,在CQA上,NS-CQA在复杂性更高的问题上表现良好,而仅使用了总数约1%的训练样本。我们鼓励模型生成新的试验,以避免模仿假试验,同时使模型记住过去的高奖励试验,以提高数据效率。我们的NS-CQA模型在两个数据集上进行了评估:CQA(一个最近的大规模复杂问题回答数据集)和WebQuestionsSP(一个多跳问题回答数据集)。在这两个数据集上,我们的模型都优于最新模型。值得注意的是,在CQA上,NS-CQA在复杂性更高的问题上表现良好,而仅使用了总数约1%的训练样本。

更新日期:2020-10-29
down
wechat
bug