当前位置: X-MOL 学术Artif. Intell. Med. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
SemBioNLQA: A semantic biomedical question answering system for retrieving exact and ideal answers to natural language questions.
Artificial Intelligence in Medicine ( IF 6.1 ) Pub Date : 2019-11-28 , DOI: 10.1016/j.artmed.2019.101767
Mourad Sarrouti 1 , Said Ouatik El Alaoui 2
Affiliation  

Background and objective

Question answering (QA), the identification of short accurate answers to users questions written in natural language expressions, is a longstanding issue widely studied over the last decades in the open-domain. However, it still remains a real challenge in the biomedical domain as the most of the existing systems support a limited amount of question and answer types as well as still require further efforts in order to improve their performance in terms of precision for the supported questions. Here, we present a semantic biomedical QA system named SemBioNLQA which has the ability to handle the kinds of yes/no, factoid, list, and summary natural language questions.

Methods

This paper describes the system architecture and an evaluation of the developed end-to-end biomedical QA system named SemBioNLQA, which consists of question classification, document retrieval, passage retrieval and answer extraction modules. It takes natural language questions as input, and outputs both short precise answers and summaries as results. The SemBioNLQA system, dealing with four types of questions, is based on (1) handcrafted lexico-syntactic patterns and a machine learning algorithm for question classification, (2) PubMed search engine and UMLS similarity for document retrieval, (3) the BM25 model, stemmed words and UMLS concepts for passage retrieval, and (4) UMLS metathesaurus, BioPortal synonyms, sentiment analysis and term frequency metric for answer extraction.

Results and conclusion

Compared with the current state-of-the-art biomedical QA systems, SemBioNLQA, a fully automated system, has the potential to deal with a large amount of question and answer types. SemBioNLQA retrieves quickly users’ information needs by returning exact answers (e.g., “yes”, “no”, a biomedical entity name, etc.) and ideal answers (i.e., paragraph-sized summaries of relevant information) for yes/no, factoid and list questions, whereas it provides only the ideal answers for summary questions. Moreover, experimental evaluations performed on biomedical questions and answers provided by the BioASQ challenge especially in 2015, 2016 and 2017 (as part of our participation), show that SemBioNLQA achieves good performances compared with the most current state-of-the-art systems and allows a practical and competitive alternative to help information seekers find exact and ideal answers to their biomedical questions. The SemBioNLQA source code is publicly available at https://github.com/sarrouti/sembionlqa.



中文翻译:

SemBioNLQA:一种语义生物医学问题解答系统,用于检索对自然语言问题的准确和理想答案。

背景和目标

问题解答(QA)是对以自然语言表达的用户问题的简短准确答案的识别,是在开放域中近几十年来广泛研究的一个长期问题。但是,由于大多数现有系统仅支持有限数量的问题和答案类型,因此在生物医学领域仍然是一个真正的挑战,并且仍需要进一步的努力以提高其在支持问题的精确度方面的性能。在这里,我们介绍了一个名为SemBioNLQA的语义生物医学QA系统,它能够处理是/否,事实,列表和自然语言问题摘要。

方法

本文介绍了系统架构以及对名为SemBioNLQA的端到端生物医学QA系统的评估,该系统由问题分类,文档检索,段落检索和答案提取模块组成。它以自然语言问题作为输入,并输出简短的精确答案和摘要作为结果。SemBioNLQA系统处理四种类型的问题,该系统基于(1)手工制作的词汇句法模式和用于问题分类的机器学习算法,(2)PubMed搜索引擎和用于文档检索的UMLS相似性,(3)BM25模型,词干和用于段落检索的UMLS概念,以及(4)UMLS词库,BioPortal同义词,情感分析和术语频率度量用于答案提取。

结果与结论

与当前最先进的生物医学QA系统相比,全自动系统SemBioNLQA具有处理大量问答类型的潜力。SemBioNLQA通过返回正确的答案(例如“是”,“否”,生物医学实体名称等)和理想的答案(即相关信息的段落大小的摘要)来快速检索用户的信息需求,以确认是/否,事实。并列出问题,而它仅提供摘要问题的理想答案。此外,针对BioASQ挑战提供的生物医学问题和答案进行了实验评估,尤其是在2015年,2016年和2017年(作为我们参与的一部分),表明SemBioNLQA与最新的最先进的系统相比具有良好的性能,并提供了一种实用且具有竞争力的替代方案,以帮助信息检索者找到其生物医学问题的准确且理想的答案。SemBioNLQA源代码可从https://github.com/sarrouti/sembionlqa公开获得。

更新日期:2019-11-28
down
wechat
bug