当前位置: X-MOL 学术arXiv.cs.CL › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Enhancing lexical-based approach with external knowledge for Vietnamese multiple-choice machine reading comprehension
arXiv - CS - Computation and Language Pub Date : 2020-01-16 , DOI: arxiv-2001.05687
Kiet Van Nguyen, Khiem Vinh Tran, Son T. Luu, Anh Gia-Tuan Nguyen, Ngan Luu-Thuy Nguyen

Although Vietnamese is the 17th most popular native-speaker language in the world, there are not many research studies on Vietnamese machine reading comprehension (MRC), the task of understanding a text and answering questions about it. One of the reasons is because of the lack of high-quality benchmark datasets for this task. In this work, we construct a dataset which consists of 2,783 pairs of multiple-choice questions and answers based on 417 Vietnamese texts which are commonly used for teaching reading comprehension for elementary school pupils. In addition, we propose a lexical-based MRC method that utilizes semantic similarity measures and external knowledge sources to analyze questions and extract answers from the given text. We compare the performance of the proposed model with several baseline lexical-based and neural network-based models. Our proposed method achieves 61.81% by accuracy, which is 5.51% higher than the best baseline model. We also measure human performance on our dataset and find that there is a big gap between machine-model and human performances. This indicates that significant progress can be made on this task. The dataset is freely available on our website for research purposes.

中文翻译:

借助外部知识增强基于词汇的方法,用于越南语多项选择机器阅读理解

尽管越南语是世界上第 17 大最受欢迎的母语语言,但关于越南语机器阅读理解 (MRC)、理解文本并回答相关问题的任务的研究并不多。原因之一是由于缺乏用于此任务的高质量基准数据集。在这项工作中,我们基于 417 篇越南语文本构建了一个包含 2,783 对多项选择题和答案的数据集,这些文本通常用于小学生阅读理解教学。此外,我们提出了一种基于词法的 MRC 方法,该方法利用语义相似性度量和外部知识源来分析问题并从给定文本中提取答案。我们将所提出模型的性能与几个基于词法和神经网络的基线模型进行了比较。我们提出的方法达到了 61.81% 的准确率,比最佳基线模型高 5.51%。我们还在我们的数据集上测量了人类的表现,发现机器模型和人类的表现之间存在很大差距。这表明这项任务可以取得重大进展。该数据集可在我们的网站上免费获得,用于研究目的。
更新日期:2020-11-03
down
wechat
bug