Ensemble approach for identifying medical concepts with special attention to lexical scope

Mondal, Anupam; Das, Dipankar

doi:10.1007/s12046-021-01593-5

Ensemble approach for identifying medical concepts with special attention to lexical scope

Published: 12 April 2021

Volume 46, article number 77, (2021)
Cite this article

Sādhanā Aims and scope Submit manuscript

109 Accesses
2 Citations
Explore all metrics

Abstract

Health-care services are implanted by deploying roots of information extraction techniques. This extraction process is laborious and time-consuming due to unavailability of medical experts. Thus, in the present task, we were motivated to develop an automated extraction system for identifying medical and non-medical concepts. These concepts help to extract the key information from medical corpora. Not only medical concepts but also their non-medical counterparts are equally important for diagnosis purposes. Hence, we have employed three different approaches such as unsupervised, supervised, and their combined ensemble version to identify both medical and non-medical terms (words/phrases). The unsupervised module consists of two phases: parts-of-speech (POS) tagging followed by searching in a domain-specific lexicon, namely WordNet of Medical Event (WME 3.0). On the other hand the supervised module is designed by two machine learning classifiers, namely Naïve Bayes and Conditional Random Field (CRF) along with various features like category, POS, sentiment, etc. Finally, we have combined the important outcomes of unsupervised and supervised modules and developed two versions of ensemble module (Ensemble-I and Ensemble-II). All the modules identify uni-gram, bi-gram, tri-gram, and more than tri-gram medical concepts and separate non-medical words or phrases in a context. In order to evaluate all modules of concept identification system, we have prepared an experimental dataset. It has been split into three parts, namely training, development, and test. We observed that ensemble module provides better output in contrast with individual modules and Ensemble-I outperforms Ensemble-II in identifying medical concepts consisting of all possible n-grams. The result analysis shows that the F-measures of 0.91 and 0.94 have been obtained for identifying medical concepts and non-medical words/phrases using both of the ensemble modules, respectively. The present research reports the initial steps to build an automated concept identification framework in health-care. This system assists in designing various domain-specific applications like annotation, categorization, recommendation system, etc.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An ensemble approach for healthcare application and diagnosis using natural language processing

Article 17 January 2022

Relation Extraction of Medical Concepts Using Categorization and Sentiment Analysis

Article 07 June 2018

Stratifying Risk of Coronary Artery Disease Using Discriminative Knowledge-Guided Medical Concept Pairings from Clinical Notes

Notes

References

Cambria E, Hussain A and Eckl C 2011 Bridging the gap between structured and unstructured healthcare data through semantics and sentics. In: Proceedings of ACM WebSci’11, pp. 1–4
Mondal A, Cambria E, Das D, Hussain A and Bandyopadhyay S 2018 Relation extraction of medical concepts using categorization and sentiment analysis. Cognitive Computation 10(1): https://doi.org/10.1007/s12559-018-9567-8
Mondal A, Das D and Bandyopadhyay S 2017 Relationship extraction based on category of medical concepts from lexical contexts. In: Proceedings of the 14th International Conference on Natural Language Processing (ICON), pp. 212–219
Ma Y, Cambria E and Gao S 2016 Label embedding for zero-shot fine-grained named entity typing. In: Proceedings of COLING, Osaka, pp. 171–180
Mondal A, Das D, Cambria E and Bandyopadhyay S 2018 WME 3.0: an enhanced and validated lexicon of medical concepts. In: Proceedings of the Ninth Global WordNet Conference
Mondal A, Cambria E, Das D and Bandyopadhyay S 2017 Employing sentiment-based affinity and gravity scores to identify relations of medical concepts. In: Proceedings of the IEEE Symposium Series on Computational Intelligence (SSCI), IEEE, pp. 1–7
Chaturvedi I, Ragusa E, Gastaldo P, Zunino R and Cambria E 2017 Bayesian network based extreme learning machine for subjectivity detection. Journal of The Franklin Institute 355(4): 1780–1797
Article MathSciNet Google Scholar
Aggarwal C C 2014 Data classification: algorithms and applications. CRC Press,
Book Google Scholar
Wolpert D H 1992 Stacked generalization. Neural Networks 5(2): 241–259
Article Google Scholar
Garten Y, Coulet A and Altman R B 2010 Recent progress in automatically extracting information from the pharmacogenomic literature. Pharmacogenomics 11(10): 1467–1489
Article Google Scholar
Kim Y, Riloff E and Hurdle J F 2015 A study of concept extraction across different types of clinical notes. In: Proceedings of the AMIA Annual Symposium Proceedings, American Medical Informatics Association, vol. 2015, p. 737
Mondal A, Satapathy R, Das D and Bandyopadhyay S 2016 A hybrid approach based sentiment extraction from medical context. In: Proceedings of SAAIP, IJCAI, vol. 1619, pp. 35–40
Google Scholar
Pakhomov S V, Coden A and Chute C G 2006 Developing a corpus of clinical notes manually annotated for part-of-speech. International Journal of Medical Informatics 75(6): 418–429
Article Google Scholar
Savova G K, Chapman W W, Zheng J and Crowley R S 2011 Anaphoric relations in the clinical narrative: corpus creation. Journal of the American Medical Informatics Association 18(4): 459–465
Article Google Scholar
Roberts A, Gaizauskas R, Hepple M, Davis N, Demetriou G, Guo Y, Kola J S, Roberts I, Setzer A, Tapuria A, et al 2007 The clef corpus: semantic annotation of clinical text. In: AMIA Annual Symposium Proceedings, American Medical Informatics Association, vol. 2007, p. 625
Roberts A, Gaizauskas R, Hepple M, Demetriou G, Guo Y, Roberts I and Setzer A 2009 Building a semantically annotated corpus of clinical texts. Journal of Biomedical Informatics 42(5): 950–966
Article Google Scholar
Tsuruoka Y and Tsujii J. Bidirectional inference with the easiest-first strategy for tagging sequence data. In Proceedings of the conference on human language technology and empirical methods in natural language processing, pages 467–474. Association for Computational Linguistics, 2005.
Mandel M A 2006 Integrated annotation of biomedical text: creating the pennbioie corpus. In: Proceedings of Text Mining Ontologies and Natural Language Processing in Biomedicine, Manchester, UK
Kang N, Afzal Z, Singh B, Van Mulligen E M and Kors J A 2012 Using an ensemble system to improve concept extraction from clinical records. Journal of Biomedical Informatics 45(3): 423–428
Article Google Scholar
Kim Y and Riloff E 2015 Stacked generalization for medical concept extraction from clinical notes. In: Proceedings of BioNLP 15, pp. 61–70
Weissenbacher D, Sarker A, Klein A, O’Connor K, Magge A and Gonzalez-Hernandez G 2019 Deep neural networks ensemble for detecting medication mentions in tweets. arXiv:1904.05308
Mondal A, Cambria E, Das D and Bandyopadhyay S 2017 Mediconceptnet: an affinity score based medical concept network. In: Proceedings of the Thirtieth International Florida Artificial Intelligence Research Society Conference, FLAIRS, pp. 22–24
Dey M, Mondal A and Das D 2016 Ntcir-12 mobileclick: sense-based ranking and summarization of english queries. In: Proceedings of the 12th NTCIR Conference on Evaluation of Information Access Technologies, Tokyo, Japan, pp. 138–142
Cambria E, Hussain A, Durrani T, Havasi C, Eckl C and Munro J 2010 Sentic computing for patient centered applications. In: Proceedings of the 10th IEEE International Conference on Signal Processing, IEEE, pp. 1279–1282
Smith B and Fellbaum C 2004 Medical wordnet: a new methodology for the construction and validation of information resources for consumer health. In: Proceedings of the 20th International Conference on Computational Linguistics, Association for Computational Linguistics, p. 371
Viera A J, Garrett J M, et al 2005 Understanding interobserver agreement: the kappa statistic. Family Medicine 37(5): 360–363
Google Scholar
Mondal A, Chaturvedi I, Das D, Bajpai R and Bandyopadhyay S 2015 Lexical resource for medical events: a polarity based approach. In: Proceedings of the 2015 IEEE International Conference on Data Mining Workshop (ICDMW), IEEE, pp. 1302–1309
Mondal A, Das D, Cambria E and Bandyopadhyay S 2016 WMW: sense, polarity and affinity based concept resource for medical events. In: Proceedings of the Eighth Global WordNet Conference, pp. 242–246
Lee E S 2017 Exploring the performance of stacking classifier to predict depression among the elderly. In: Proceedings of the 2017 IEEE International Conference on Healthcare Informatics (ICHI), IEEE, pp. 13–20
Kulick S, Bies A, Liberman M, Mandel M, McDonald R, Palmer M, Schein A, Ungar L, Winters S and White P 2004 Integrated annotation for biomedical information extraction. In: Proceedings of the HLT-NAACL 2004 Workshop: Linking Biological Literature, Ontologies and Databases, pp. 61–68

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Jadavpur University, Kolkata, India
Anupam Mondal & Dipankar Das

Authors

Anupam Mondal
View author publications
You can also search for this author in PubMed Google Scholar
Dipankar Das
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Anupam Mondal.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Informed consent

Informed consent was not required as no human or animals were involved.

Human and animal rights

This article does not contain any studies with human or animal subjects performed by any of the authors.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mondal, A., Das, D. Ensemble approach for identifying medical concepts with special attention to lexical scope. Sādhanā 46, 77 (2021). https://doi.org/10.1007/s12046-021-01593-5

Download citation

Received: 01 November 2019
Revised: 31 December 2020
Accepted: 24 February 2021
Published: 12 April 2021
DOI: https://doi.org/10.1007/s12046-021-01593-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Ensemble approach for identifying medical concepts with special attention to lexical scope

Abstract

Access this article

Similar content being viewed by others

An ensemble approach for healthcare application and diagnosis using natural language processing

Relation Extraction of Medical Concepts Using Categorization and Sentiment Analysis

Stratifying Risk of Coronary Artery Disease Using Discriminative Knowledge-Guided Medical Concept Pairings from Clinical Notes

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Informed consent

Human and animal rights

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Ensemble approach for identifying medical concepts with special attention to lexical scope

Abstract

Access this article

Similar content being viewed by others

An ensemble approach for healthcare application and diagnosis using natural language processing

Relation Extraction of Medical Concepts Using Categorization and Sentiment Analysis

Stratifying Risk of Coronary Artery Disease Using Discriminative Knowledge-Guided Medical Concept Pairings from Clinical Notes

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Informed consent

Human and animal rights

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation