当前位置:
X-MOL 学术
›
Mob. Inf. Syst.
›
论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Drug Disease Relation Extraction from Biomedical Literature Using NLP and Machine Learning
Mobile Information Systems ( IF 1.863 ) Pub Date : 2021-05-19 , DOI: 10.1155/2021/9958410 Wahiba Ben Abdessalem Karaa 1, 2 , Eman H. Alkhammash 3 , Aida Bchir 2
Mobile Information Systems ( IF 1.863 ) Pub Date : 2021-05-19 , DOI: 10.1155/2021/9958410 Wahiba Ben Abdessalem Karaa 1, 2 , Eman H. Alkhammash 3 , Aida Bchir 2
Affiliation
Extracting the relations between medical concepts is very valuable in the medical domain. Scientists need to extract relevant information and semantic relations between medical concepts, including protein and protein, gene and protein, drug and drug, and drug and disease. These relations can be extracted from biomedical literature available on various databases. This study examines the extraction of semantic relations that can occur between diseases and drugs. Findings will help specialists make good decisions when administering a medication to a patient and will allow them to continuously be up to date in their field. The objective of this work is to identify different features related to drugs and diseases from medical texts by applying Natural Language Processing (NLP) techniques and UMLS ontology. The Support Vector Machine classifier uses these features to extract valuable semantic relationships among text entities. The contributing factor of this research is the combination of the strength of a suggested NLP technique, which takes advantage of UMLS ontology and enables the extraction of correct and adequate features (frequency features, lexical features, morphological features, syntactic features, and semantic features), and Support Vector Machines with polynomial kernel function. These features are manipulated to pinpoint the relations between drug and disease. The proposed approach was evaluated using a standard corpus extracted from MEDLINE. The finding considerably improves the performance and outperforms similar works, especially the f-score for the most important relation “cure,” which is equal to 98.19%. The accuracy percentage is better than those in all the existing works for all the relations.
中文翻译:
使用NLP和机器学习从生物医学文献中提取药物疾病关系
提取医学概念之间的关系在医学领域中非常有价值。科学家需要提取医学概念之间的相关信息和语义关系,包括蛋白质和蛋白质,基因和蛋白质,药物和药物以及药物和疾病。这些关系可以从各种数据库中可用的生物医学文献中提取。这项研究探讨了疾病和药物之间可能发生的语义关系的提取。这些发现将有助于专家在向患者服用药物时做出正确的决定,并使他们在领域中不断取得最新进展。这项工作的目的是通过应用自然语言处理(NLP)技术和UMLS本体,从医学文本中识别与毒品和疾病相关的不同特征。支持向量机分类器使用这些功能来提取文本实体之间的宝贵语义关系。这项研究的促成因素是建议的NLP技术的优势的结合,该技术利用了UMLS本体,并能够提取正确和适当的特征(频率特征,词法特征,形态特征,句法特征和语义特征) ,以及具有多项式内核函数的支持向量机。这些特征被操纵以查明药物与疾病之间的关系。使用从MEDLINE中提取的标准语料库对提出的方法进行了评估。该发现大大提高了性能,并且胜过了类似的作品,尤其是最重要的“治愈”关系的f分数等于98.19%。
更新日期:2021-05-19
中文翻译:
使用NLP和机器学习从生物医学文献中提取药物疾病关系
提取医学概念之间的关系在医学领域中非常有价值。科学家需要提取医学概念之间的相关信息和语义关系,包括蛋白质和蛋白质,基因和蛋白质,药物和药物以及药物和疾病。这些关系可以从各种数据库中可用的生物医学文献中提取。这项研究探讨了疾病和药物之间可能发生的语义关系的提取。这些发现将有助于专家在向患者服用药物时做出正确的决定,并使他们在领域中不断取得最新进展。这项工作的目的是通过应用自然语言处理(NLP)技术和UMLS本体,从医学文本中识别与毒品和疾病相关的不同特征。支持向量机分类器使用这些功能来提取文本实体之间的宝贵语义关系。这项研究的促成因素是建议的NLP技术的优势的结合,该技术利用了UMLS本体,并能够提取正确和适当的特征(频率特征,词法特征,形态特征,句法特征和语义特征) ,以及具有多项式内核函数的支持向量机。这些特征被操纵以查明药物与疾病之间的关系。使用从MEDLINE中提取的标准语料库对提出的方法进行了评估。该发现大大提高了性能,并且胜过了类似的作品,尤其是最重要的“治愈”关系的f分数等于98.19%。