当前位置: X-MOL 学术Inf. Process. Manag. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
WabiQA: A Wikipedia-Based Thai Question-Answering System
Information Processing & Management ( IF 8.6 ) Pub Date : 2020-11-25 , DOI: 10.1016/j.ipm.2020.102431
Thanapon Noraset , Lalita Lowphansirikul , Suppawong Tuarob

With vast information that has been digitized and made available online, manually finding the answer to a question can be tedious. While search engines have emerged to facilitate information needs, users would have to manually read through the retrieved articles to locate the answer to a specific question. Therefore, the ability to automatically understand users’ natural language questions and find the correct answers could prove crucial in information retrieval. Indeed, such automatic question-answering solutions have been extensively studied by the natural language processing (NLP) research communities. However, most of the development targets questions and information sources composed in high-resource languages such as English and Chinese. In this paper, we propose WabiQA, a novel system for automatically answering questions in the Thai language using the Thai Wikipedia articles as the knowledge source. Specifically, the proposed method first retrieves the Wikipedia article that is most likely to contain the answer. Then, a bi-directional LSTM model is used to read the article and locate candidate answers, which are ranked by confidence levels and returned to the user. WabiQA won the first prize award from Thailand’s National Software Contest 2019 under category “Question-Answering Program from Thai Wikipedia,” with 83.5%, 34.80%, and 45.96%, and outperforming the next best competitors’ systems by 19.99, 24.26, and 33.10 percentage points in terms of Accuracy@1, EM, and F1 respectively. Furthermore, we also develop a prototype mobile application that aims to facilitate Thai users with visual impairment using voice-to-speech technology and an intelligent question-answer categorization. The findings of this research not only expand the horizon of the possibility to develop intelligent NLP applications for the Thai language using only available existing Thai NLP tools, resources, and deep learning technologies, but also shed light on the possibility to apply such techniques to develop many intelligent NLP tasks for the Thai and other low-resource languages such as reading assessment, writing assistance, and entity linking.



中文翻译:

WabiQA:基于维基百科的泰国问题解答系统

随着大量信息已经数字化并可以在线获得,手动找到问题的答案可能很繁琐。尽管出现了搜索引擎来满足信息需求,但用户将不得不手动阅读所检索到的文章以找到特定问题的答案。因此,自动理解用户的自然语言问题并找到正确答案的能力可能对信息检索至关重要。实际上,自然语言处理(NLP)研究社区已广泛研究了此类自动问答解决方案。但是,大多数开发都针对以英语和汉语等高资源语言组成的问题和信息源。在本文中,我们提出了WabiQA,这是一种新颖的系统,可使用泰国维基百科的文章作为知识源来自动回答泰语问题。具体来说,建议的方法首先检索最有可能包含答案的Wikipedia文章。然后,使用双向LSTM模型阅读文章并找到候选答案,这些答案将按照置信度进行排名并返回给用户。WabiQA在2019年泰国国家软件竞赛的“来自泰国维基百科的问题解答计划”类别中获得一等奖,分别以83.5%,34.80%和45.96%的比例获得了领先,并以19.99%,24.26%和33.10%的表现领先于次优竞争对手。分别以Accuracy @ 1,EM和F1表示。此外,我们还开发了一个原型移动应用程序,旨在通过语音转换技术和智能问答方式为泰国视障用户提供便利。这项研究的发现不仅扩大了仅使用现有的泰国NLP工具,资源和深度学习技术为泰国语言开发智能NLP应用程序的可能性,

更新日期:2020-11-25
down
wechat
bug