当前位置: X-MOL 学术Nat. Lang. Eng. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Automatic question generation based on sentence structure analysis using machine learning approach
Natural Language Engineering ( IF 2.3 ) Pub Date : 2021-06-17 , DOI: 10.1017/s1351324921000139
Miroslav Blšták , Viera Rozinajová

Automatic question generation is one of the most challenging tasks of Natural Language Processing. It requires “bidirectional” language processing: first, the system has to understand the input text (Natural Language Understanding), and it then has to generate questions also in the form of text (Natural Language Generation). In this article, we introduce our framework for generating the factual questions from unstructured text in the English language. It uses a combination of traditional linguistic approaches based on sentence patterns with several machine learning methods. We first obtain lexical, syntactic and semantic information from an input text, and we then construct a hierarchical set of patterns for each sentence. The set of features is extracted from the patterns, and it is then used for automated learning of new transformation rules. Our learning process is totally data-driven because the transformation rules are obtained from a set of initial sentence–question pairs. The advantages of this approach lie in a simple expansion of new transformation rules which allows us to generate various types of questions and also in the continuous improvement of the system by reinforcement learning. The framework also includes a question evaluation module which estimates the quality of generated questions. It serves as a filter for selecting the best questions and eliminating incorrect ones or duplicates. We have performed several experiments to evaluate the correctness of generated questions, and we have also compared our system with several state-of-the-art systems. Our results indicate that the quality of generated questions outperforms the state-of-the-art systems and our questions are also comparable to questions created by humans. We have also created and published an interface with all created data sets and evaluated questions, so it is possible to follow up on our work.



中文翻译:

使用机器学习方法基于句子结构分析的自动问题生成

自动问题生成是自然语言处理中最具挑战性的任务之一。它需要“双向”语言处理:首先,系统必须理解输入文本(自然语言理解),然后必须以文本形式生成问题(自然语言生成)。在本文中,我们介绍了从英语中的非结构化文本中生成事实问题的框架。它结合了基于句型的传统语言方法和多种机器学习方法。我们首先从输入文本中获取词汇、句法和语义信息,然后为每个句子构建一组分层模式。从模式中提取特征集,然后将其用于自动学习新的转换规则。我们的学习过程完全是数据驱动的,因为转换规则是从一组初始句子-问题对中获得的。这种方法的优点在于可以简单地扩展新的转换规则,使我们能够生成各种类型的问题,以及通过强化学习不断改进系统。该框架还包括一个问题评估模块,用于估计生成问题的质量。它充当筛选器,用于选择最佳问题并消除不正确或重复的问题。我们已经进行了几次实验来评估生成问题的正确性,并且我们还将我们的系统与几个最先进的系统进行了比较。我们的结果表明,生成的问题的质量优于最先进的系统,我们的问题也可以与人类创建的问题相媲美。我们还创建并发布了一个包含所有创建的数据集和评估问题的界面,因此可以跟进我们的工作。

更新日期:2021-06-17
down
wechat
bug