Abstract
Health-care services are implanted by deploying roots of information extraction techniques. This extraction process is laborious and time-consuming due to unavailability of medical experts. Thus, in the present task, we were motivated to develop an automated extraction system for identifying medical and non-medical concepts. These concepts help to extract the key information from medical corpora. Not only medical concepts but also their non-medical counterparts are equally important for diagnosis purposes. Hence, we have employed three different approaches such as unsupervised, supervised, and their combined ensemble version to identify both medical and non-medical terms (words/phrases). The unsupervised module consists of two phases: parts-of-speech (POS) tagging followed by searching in a domain-specific lexicon, namely WordNet of Medical Event (WME 3.0). On the other hand the supervised module is designed by two machine learning classifiers, namely Naïve Bayes and Conditional Random Field (CRF) along with various features like category, POS, sentiment, etc. Finally, we have combined the important outcomes of unsupervised and supervised modules and developed two versions of ensemble module (Ensemble-I and Ensemble-II). All the modules identify uni-gram, bi-gram, tri-gram, and more than tri-gram medical concepts and separate non-medical words or phrases in a context. In order to evaluate all modules of concept identification system, we have prepared an experimental dataset. It has been split into three parts, namely training, development, and test. We observed that ensemble module provides better output in contrast with individual modules and Ensemble-I outperforms Ensemble-II in identifying medical concepts consisting of all possible n-grams. The result analysis shows that the F-measures of 0.91 and 0.94 have been obtained for identifying medical concepts and non-medical words/phrases using both of the ensemble modules, respectively. The present research reports the initial steps to build an automated concept identification framework in health-care. This system assists in designing various domain-specific applications like annotation, categorization, recommendation system, etc.
Similar content being viewed by others
Notes
References
Cambria E, Hussain A and Eckl C 2011 Bridging the gap between structured and unstructured healthcare data through semantics and sentics. In: Proceedings of ACM WebSci’11, pp. 1–4
Mondal A, Cambria E, Das D, Hussain A and Bandyopadhyay S 2018 Relation extraction of medical concepts using categorization and sentiment analysis. Cognitive Computation 10(1): https://doi.org/10.1007/s12559-018-9567-8
Mondal A, Das D and Bandyopadhyay S 2017 Relationship extraction based on category of medical concepts from lexical contexts. In: Proceedings of the 14th International Conference on Natural Language Processing (ICON), pp. 212–219
Ma Y, Cambria E and Gao S 2016 Label embedding for zero-shot fine-grained named entity typing. In: Proceedings of COLING, Osaka, pp. 171–180
Mondal A, Das D, Cambria E and Bandyopadhyay S 2018 WME 3.0: an enhanced and validated lexicon of medical concepts. In: Proceedings of the Ninth Global WordNet Conference
Mondal A, Cambria E, Das D and Bandyopadhyay S 2017 Employing sentiment-based affinity and gravity scores to identify relations of medical concepts. In: Proceedings of the IEEE Symposium Series on Computational Intelligence (SSCI), IEEE, pp. 1–7
Chaturvedi I, Ragusa E, Gastaldo P, Zunino R and Cambria E 2017 Bayesian network based extreme learning machine for subjectivity detection. Journal of The Franklin Institute 355(4): 1780–1797
Aggarwal C C 2014 Data classification: algorithms and applications. CRC Press,
Wolpert D H 1992 Stacked generalization. Neural Networks 5(2): 241–259
Garten Y, Coulet A and Altman R B 2010 Recent progress in automatically extracting information from the pharmacogenomic literature. Pharmacogenomics 11(10): 1467–1489
Kim Y, Riloff E and Hurdle J F 2015 A study of concept extraction across different types of clinical notes. In: Proceedings of the AMIA Annual Symposium Proceedings, American Medical Informatics Association, vol. 2015, p. 737
Mondal A, Satapathy R, Das D and Bandyopadhyay S 2016 A hybrid approach based sentiment extraction from medical context. In: Proceedings of SAAIP, IJCAI, vol. 1619, pp. 35–40
Pakhomov S V, Coden A and Chute C G 2006 Developing a corpus of clinical notes manually annotated for part-of-speech. International Journal of Medical Informatics 75(6): 418–429
Savova G K, Chapman W W, Zheng J and Crowley R S 2011 Anaphoric relations in the clinical narrative: corpus creation. Journal of the American Medical Informatics Association 18(4): 459–465
Roberts A, Gaizauskas R, Hepple M, Davis N, Demetriou G, Guo Y, Kola J S, Roberts I, Setzer A, Tapuria A, et al 2007 The clef corpus: semantic annotation of clinical text. In: AMIA Annual Symposium Proceedings, American Medical Informatics Association, vol. 2007, p. 625
Roberts A, Gaizauskas R, Hepple M, Demetriou G, Guo Y, Roberts I and Setzer A 2009 Building a semantically annotated corpus of clinical texts. Journal of Biomedical Informatics 42(5): 950–966
Tsuruoka Y and Tsujii J. Bidirectional inference with the easiest-first strategy for tagging sequence data. In Proceedings of the conference on human language technology and empirical methods in natural language processing, pages 467–474. Association for Computational Linguistics, 2005.
Mandel M A 2006 Integrated annotation of biomedical text: creating the pennbioie corpus. In: Proceedings of Text Mining Ontologies and Natural Language Processing in Biomedicine, Manchester, UK
Kang N, Afzal Z, Singh B, Van Mulligen E M and Kors J A 2012 Using an ensemble system to improve concept extraction from clinical records. Journal of Biomedical Informatics 45(3): 423–428
Kim Y and Riloff E 2015 Stacked generalization for medical concept extraction from clinical notes. In: Proceedings of BioNLP 15, pp. 61–70
Weissenbacher D, Sarker A, Klein A, O’Connor K, Magge A and Gonzalez-Hernandez G 2019 Deep neural networks ensemble for detecting medication mentions in tweets. arXiv:1904.05308
Mondal A, Cambria E, Das D and Bandyopadhyay S 2017 Mediconceptnet: an affinity score based medical concept network. In: Proceedings of the Thirtieth International Florida Artificial Intelligence Research Society Conference, FLAIRS, pp. 22–24
Dey M, Mondal A and Das D 2016 Ntcir-12 mobileclick: sense-based ranking and summarization of english queries. In: Proceedings of the 12th NTCIR Conference on Evaluation of Information Access Technologies, Tokyo, Japan, pp. 138–142
Cambria E, Hussain A, Durrani T, Havasi C, Eckl C and Munro J 2010 Sentic computing for patient centered applications. In: Proceedings of the 10th IEEE International Conference on Signal Processing, IEEE, pp. 1279–1282
Smith B and Fellbaum C 2004 Medical wordnet: a new methodology for the construction and validation of information resources for consumer health. In: Proceedings of the 20th International Conference on Computational Linguistics, Association for Computational Linguistics, p. 371
Viera A J, Garrett J M, et al 2005 Understanding interobserver agreement: the kappa statistic. Family Medicine 37(5): 360–363
Mondal A, Chaturvedi I, Das D, Bajpai R and Bandyopadhyay S 2015 Lexical resource for medical events: a polarity based approach. In: Proceedings of the 2015 IEEE International Conference on Data Mining Workshop (ICDMW), IEEE, pp. 1302–1309
Mondal A, Das D, Cambria E and Bandyopadhyay S 2016 WMW: sense, polarity and affinity based concept resource for medical events. In: Proceedings of the Eighth Global WordNet Conference, pp. 242–246
Lee E S 2017 Exploring the performance of stacking classifier to predict depression among the elderly. In: Proceedings of the 2017 IEEE International Conference on Healthcare Informatics (ICHI), IEEE, pp. 13–20
Kulick S, Bies A, Liberman M, Mandel M, McDonald R, Palmer M, Schein A, Ungar L, Winters S and White P 2004 Integrated annotation for biomedical information extraction. In: Proceedings of the HLT-NAACL 2004 Workshop: Linking Biological Literature, Ontologies and Databases, pp. 61–68
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Informed consent
Informed consent was not required as no human or animals were involved.
Human and animal rights
This article does not contain any studies with human or animal subjects performed by any of the authors.
Rights and permissions
About this article
Cite this article
Mondal, A., Das, D. Ensemble approach for identifying medical concepts with special attention to lexical scope. Sādhanā 46, 77 (2021). https://doi.org/10.1007/s12046-021-01593-5
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s12046-021-01593-5