Skip to main content
Log in

Ensemble approach for identifying medical concepts with special attention to lexical scope

  • Published:
Sādhanā Aims and scope Submit manuscript

Abstract

Health-care services are implanted by deploying roots of information extraction techniques. This extraction process is laborious and time-consuming due to unavailability of medical experts. Thus, in the present task, we were motivated to develop an automated extraction system for identifying medical and non-medical concepts. These concepts help to extract the key information from medical corpora. Not only medical concepts but also their non-medical counterparts are equally important for diagnosis purposes. Hence, we have employed three different approaches such as unsupervised, supervised, and their combined ensemble version to identify both medical and non-medical terms (words/phrases). The unsupervised module consists of two phases: parts-of-speech (POS) tagging followed by searching in a domain-specific lexicon, namely WordNet of Medical Event (WME 3.0). On the other hand the supervised module is designed by two machine learning classifiers, namely Naïve Bayes and Conditional Random Field (CRF) along with various features like category, POS, sentiment, etc. Finally, we have combined the important outcomes of unsupervised and supervised modules and developed two versions of ensemble module (Ensemble-I and Ensemble-II). All the modules identify uni-gram, bi-gram, tri-gram, and more than tri-gram medical concepts and separate non-medical words or phrases in a context. In order to evaluate all modules of concept identification system, we have prepared an experimental dataset. It has been split into three parts, namely training, development, and test. We observed that ensemble module provides better output in contrast with individual modules and Ensemble-I outperforms Ensemble-II in identifying medical concepts consisting of all possible n-grams. The result analysis shows that the F-measures of 0.91 and 0.94 have been obtained for identifying medical concepts and non-medical words/phrases using both of the ensemble modules, respectively. The present research reports the initial steps to build an automated concept identification framework in health-care. This system assists in designing various domain-specific applications like annotation, categorization, recommendation system, etc.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1
Figure 2
Figure 3
Figure 4

Similar content being viewed by others

Notes

  1. http://www.prmoment.in/category/pr-news/survey-shows-that-49-of-indians-use-the-internet-for-health-information.

  2. http://www.nbcnews.com/id/3077086/t/more-people-search-health-online/#.XVUZl599LqM.

  3. http://alt.qcri.org/semeval2015/task6/.

  4. http://www.medicinenet.com/script/main/hp.asp.

  5. http://davis.wpi.edu/xmdv/datasets/ohsumed.html.

  6. http://brat.nlplab.org.

  7. https://www.nltk.org/book/ch05.html.

References

  1. Cambria E, Hussain A and Eckl C 2011 Bridging the gap between structured and unstructured healthcare data through semantics and sentics. In: Proceedings of ACM WebSci’11, pp. 1–4

  2. Mondal A, Cambria E, Das D, Hussain A and Bandyopadhyay S 2018 Relation extraction of medical concepts using categorization and sentiment analysis. Cognitive Computation 10(1): https://doi.org/10.1007/s12559-018-9567-8

  3. Mondal A, Das D and Bandyopadhyay S 2017 Relationship extraction based on category of medical concepts from lexical contexts. In: Proceedings of the 14th International Conference on Natural Language Processing (ICON), pp. 212–219

  4. Ma Y, Cambria E and Gao S 2016 Label embedding for zero-shot fine-grained named entity typing. In: Proceedings of COLING, Osaka, pp. 171–180

  5. Mondal A, Das D, Cambria E and Bandyopadhyay S 2018 WME 3.0: an enhanced and validated lexicon of medical concepts. In: Proceedings of the Ninth Global WordNet Conference

  6. Mondal A, Cambria E, Das D and Bandyopadhyay S 2017 Employing sentiment-based affinity and gravity scores to identify relations of medical concepts. In: Proceedings of the IEEE Symposium Series on Computational Intelligence (SSCI), IEEE, pp. 1–7

  7. Chaturvedi I, Ragusa E, Gastaldo P, Zunino R and Cambria E 2017 Bayesian network based extreme learning machine for subjectivity detection. Journal of The Franklin Institute 355(4): 1780–1797

    Article  MathSciNet  Google Scholar 

  8. Aggarwal C C 2014 Data classification: algorithms and applications. CRC Press,

    Book  Google Scholar 

  9. Wolpert D H 1992 Stacked generalization. Neural Networks 5(2): 241–259

    Article  Google Scholar 

  10. Garten Y, Coulet A and Altman R B 2010 Recent progress in automatically extracting information from the pharmacogenomic literature. Pharmacogenomics 11(10): 1467–1489

    Article  Google Scholar 

  11. Kim Y, Riloff E and Hurdle J F 2015 A study of concept extraction across different types of clinical notes. In: Proceedings of the AMIA Annual Symposium Proceedings, American Medical Informatics Association, vol. 2015, p. 737

  12. Mondal A, Satapathy R, Das D and Bandyopadhyay S 2016 A hybrid approach based sentiment extraction from medical context. In: Proceedings of SAAIP, IJCAI, vol. 1619, pp. 35–40

    Google Scholar 

  13. Pakhomov S V, Coden A and Chute C G 2006 Developing a corpus of clinical notes manually annotated for part-of-speech. International Journal of Medical Informatics 75(6): 418–429

    Article  Google Scholar 

  14. Savova G K, Chapman W W, Zheng J and Crowley R S 2011 Anaphoric relations in the clinical narrative: corpus creation. Journal of the American Medical Informatics Association 18(4): 459–465

    Article  Google Scholar 

  15. Roberts A, Gaizauskas R, Hepple M, Davis N, Demetriou G, Guo Y, Kola J S, Roberts I, Setzer A, Tapuria A, et al 2007 The clef corpus: semantic annotation of clinical text. In: AMIA Annual Symposium Proceedings, American Medical Informatics Association, vol. 2007, p. 625

  16. Roberts A, Gaizauskas R, Hepple M, Demetriou G, Guo Y, Roberts I and Setzer A 2009 Building a semantically annotated corpus of clinical texts. Journal of Biomedical Informatics 42(5): 950–966

    Article  Google Scholar 

  17. Tsuruoka Y and Tsujii J. Bidirectional inference with the easiest-first strategy for tagging sequence data. In Proceedings of the conference on human language technology and empirical methods in natural language processing, pages 467–474. Association for Computational Linguistics, 2005.

  18. Mandel M A 2006 Integrated annotation of biomedical text: creating the pennbioie corpus. In: Proceedings of Text Mining Ontologies and Natural Language Processing in Biomedicine, Manchester, UK

  19. Kang N, Afzal Z, Singh B, Van Mulligen E M and Kors J A 2012 Using an ensemble system to improve concept extraction from clinical records. Journal of Biomedical Informatics 45(3): 423–428

    Article  Google Scholar 

  20. Kim Y and Riloff E 2015 Stacked generalization for medical concept extraction from clinical notes. In: Proceedings of BioNLP 15, pp. 61–70

  21. Weissenbacher D, Sarker A, Klein A, O’Connor K, Magge A and Gonzalez-Hernandez G 2019 Deep neural networks ensemble for detecting medication mentions in tweets. arXiv:1904.05308

  22. Mondal A, Cambria E, Das D and Bandyopadhyay S 2017 Mediconceptnet: an affinity score based medical concept network. In: Proceedings of the Thirtieth International Florida Artificial Intelligence Research Society Conference, FLAIRS, pp. 22–24

  23. Dey M, Mondal A and Das D 2016 Ntcir-12 mobileclick: sense-based ranking and summarization of english queries. In: Proceedings of the 12th NTCIR Conference on Evaluation of Information Access Technologies, Tokyo, Japan, pp. 138–142

  24. Cambria E, Hussain A, Durrani T, Havasi C, Eckl C and Munro J 2010 Sentic computing for patient centered applications. In: Proceedings of the 10th IEEE International Conference on Signal Processing, IEEE, pp. 1279–1282

  25. Smith B and Fellbaum C 2004 Medical wordnet: a new methodology for the construction and validation of information resources for consumer health. In: Proceedings of the 20th International Conference on Computational Linguistics, Association for Computational Linguistics, p. 371

  26. Viera A J, Garrett J M, et al 2005 Understanding interobserver agreement: the kappa statistic. Family Medicine 37(5): 360–363

    Google Scholar 

  27. Mondal A, Chaturvedi I, Das D, Bajpai R and Bandyopadhyay S 2015 Lexical resource for medical events: a polarity based approach. In: Proceedings of the 2015 IEEE International Conference on Data Mining Workshop (ICDMW), IEEE, pp. 1302–1309

  28. Mondal A, Das D, Cambria E and Bandyopadhyay S 2016 WMW: sense, polarity and affinity based concept resource for medical events. In: Proceedings of the Eighth Global WordNet Conference, pp. 242–246

  29. Lee E S 2017 Exploring the performance of stacking classifier to predict depression among the elderly. In: Proceedings of the 2017 IEEE International Conference on Healthcare Informatics (ICHI), IEEE, pp. 13–20

  30. Kulick S, Bies A, Liberman M, Mandel M, McDonald R, Palmer M, Schein A, Ungar L, Winters S and White P 2004 Integrated annotation for biomedical information extraction. In: Proceedings of the HLT-NAACL 2004 Workshop: Linking Biological Literature, Ontologies and Databases, pp. 61–68

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Anupam Mondal.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Informed consent

Informed consent was not required as no human or animals were involved.

Human and animal rights

This article does not contain any studies with human or animal subjects performed by any of the authors.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mondal, A., Das, D. Ensemble approach for identifying medical concepts with special attention to lexical scope. Sādhanā 46, 77 (2021). https://doi.org/10.1007/s12046-021-01593-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s12046-021-01593-5

Keywords

Navigation