Abstract
The amount of information in the scientific literature of the bio-medical domain is growing exponentially, which makes it difficult in developing a smart medical system. Summarization techniques help for efficient searching and understanding of relevant information from the medical documents. In the paper, an evolutionary algorithm based ensemble extractive summarization technique is devised as a smart medical application with the idea of hybrid artificial intelligence on natural language processing. We have considered the abstracts of the target article and its cited articles as the base summaries and a multi-objective evolutionary algorithm is applied for generating the ensemble summary of the target article. Each sentence of the base summaries is represented by a concept vector of the medical terms contained in it with the help of the Unified Modelling Language System (UMLS) tool which is widely used in various smart medical applications. These terms carry the key information of the sentence which is very useful to find out the semantic similarity among the sentences. Fitness functions of the evolutionary algorithm are mainly defined using clustering coefficient and sparsity index, the concepts of graph theory. After the convergence of the algorithm, the best solution of the final population gives the ensemble summary. Next, the semantic similarity of each sentence in the target article with the ensemble summary is calculated and the sentences which are most similar to the ensemble summary are considered as the summary of the target article. The method is applied to the articles available in the PubMed MEDLINE database system and experimental results are compared with some state of the art methods applied in the Bio-medical domain. Experimental results and comparative study based on the performance evaluation show that the method competes with some recently proposed summarization methods and outperforms others, which express the effectiveness of the proposed methodology. Different statistical tests have also been made to observe that the method is statistically significant.
Graphic abstract
Similar content being viewed by others
Notes
PubMed MEDLINE dataset, https://www.nlm.nih.gov/databases/download/pubmed_medline.html
Pubmed central, https://www.ncbi.nlm.nih.gov/pmc/
BIOASQ, http://www.bioasq.org/
Pubmed open-access (oa) subset, https://www.ncbi.nlm.nih.gov/pmc/tools/ftp/
Medline xml repository, https://www.nlm.nih.gov/databases/download/data_distrib_main.html
UMLS metathesaurus metamap, https://www.nlm.nih.gov/research/umls/implementation_resources/metamap.html
Python 2.7.14 documentation, https://docs.python.org/2/index.htmlT
Pygmo documentation, https://media.readthedocs.org/pdf/pygmo/newdocs/pygmo.pdf
Swesum: Automatic text summarizer, http://swesum.nada.kth.se/index.html
References
Tas O, Kiyani F (2007) A survey automatic text summarization. Press Acad Proc 5(1):205–213. https://doi.org/10.17261/Pressacademia.2017.591 [Online]
Nazari N, Mahdavi M (2019) A survey on automatic text summarization. J AI Data Min 7(1):121–135. https://doi.org/10.22044/JADM.2018.6139.172610.22044/JADM.2018.6139.1726 [Online]
Dalal V, Malik L (2013) A survey of extractive and abstractive text summarization techniques. In: 2013 6th International Conference on Emerging Trends in Engineering and Technology, pp. 109–110. [Online] https://doi.org/10.1109/ICETET.2013.31
Nenkova A, McKeown K (2012) A survey of text summarization techniques. Springer Science+Business Media, [Online] https://doi.org/10.1007/978-1-4614-3223-4_3
Saggion H, Lapalme G (2002) Generating indicative-informative summaries with sumum. Comput linguist 28(4):497–526. https://doi.org/10.1162/089120102762671963 [Online]
Moradi M, Dorffner G, Samwald M (2020) Deep contextualized embeddings for quantifying the informative content in biomedical text summarization. Comput Methods Prog Biomed 184:105117. https://doi.org/10.1016/j.cmpb.2019.105117 [Online]
Dutta S, Chandra V, Mehra K, Das AK, Chakraborty T, Ghosh S (2018) Ensemble algorithms for microblog summarization. IEEE Intel Syst 33(3):4–14. https://doi.org/10.1109/MIS.2018.033001411 [Online]
Corchado E, Baruque B (2012) Wevos-visom: an ensemble summarization algorithm for enhanced data visualization. Neurocomputing 75(1):171–184. https://doi.org/10.1016/j.neucom.2011.01.027 [Online]
Baruque B, Corchado E, Mata A, Corchado JM (2009) Ensemble methods for boosting visualization models. In: International Work-Conference on Artificial Neural Networks. Springer, pp. 165–173. [Online]. https://doi.org/10.1007/978-3-642-02478-8_21
Mallick C, Das AK, Dutta M, Das AK, Sarkar A (2019) Graph-based text summarization using modified textrank. Soft Comput Data Anal. Springer, pp. 137–146. [Online]. https://doi.org/10.1007/978-981-13-0514-6_14
Dutta M, Das AK, Mallick C, Sarkar A, Das AK (2019) A graph based approach on extractive summarization. Emerg Technol Data Min Inform Secur. Springer, pp. 179–187. [Online]. https://doi.org/10.1007/978-981-13-1498-8_16
Attarha M, Moore CM, Vecera SP (2014) Summary statistics of size: fixed processing capacity for multiple ensembles but unlimited processing capacity for single ensembles. J Exp Psychol Hum Percept Perform 40(4):1440. https://doi.org/10.1037/a0036206 [Online]
Lee J, Yoon W, Kim S, Kim D, Kim S, So CH, Kang J (2019) Biobert: pre-trained biomedical language representation model for biomedical text mining. https://doi.org/10.1093/bioinformatics/btz682, arXiv preprint arXiv:1901.08746 [Online]
Sun C, Qiu X, Xu Y, Huang X (2019) How to fine-tune bert for text classification? In: China National Conference on Chinese Computational Linguistics. Springer, pp. 194–206. [Online]. https://doi.org/10.1007/978-3-030-32381-3_16
Xie F-L, Soong FK, Li H (2016) A kl divergence and dnn-based approach to voice conversion without parallel training sentences. In: Interspeech, pp. 287–291. [Online]. https://doi.org/10.21437/Interspeech.2016-116
Gen M, Lin L (2007) Genetic algorithms. In: Wiley Encyclopedia of Computer Science and Engineering, pp. 1–15, [Online]. https://doi.org/10.1002/9780470050118.ecse169
Hou N, He F, Zhou Y, Chen Y, Yan X (2018) A parallel genetic algorithm with dispersion correction for hw/sw partitioning on multi-core cpu and many-core gpu. IEEE Access 6:883–898. https://doi.org/10.1109/ACCESS.2017.2776295 [Online]
Silla CN, Pappa GL, Freitas AA, Kaestner CA (2004) Automatic text summarization with genetic algorithm-based attribute selection. In: Ibero-American Conference on Artificial Intelligence. Springer, pp. 305–314. [Online]. https://doi.org/10.1007/978-3-540-30498-2_31
Das AK, Pati SK, Ghosh A (2019) Relevant feature selection and ensemble classifier design using bi-objective genetic algorithm. In: Knowledge and Information Systems, pp. 1–33, [Online]. https://doi.org/10.1007/s10115-019-01341-6
Fonseca CM, Fleming PJ (1995) An overview of evolutionary algorithms in multiobjective optimization. Evolut Comput 3(1):1–16. https://doi.org/10.1162/evco.1995.3.1.1 [Online]
Zitzler E, Laumanns M, Thiele L (2001) Spea2: improving the strength pareto evolutionary algorithm. In: TIK-report vol. 103, [Online]. https://doi.org/10.3929/ethz-a-004284029
Lu H, Zhang M, Fei Z, Mao K (2015) Multi-objective energy consumption scheduling in smart grid based on tchebycheff decomposition. IEEE Trans Smart Grid 6(6):2869–2883. https://doi.org/10.1109/TSG.2015.2419814 [Online]
Ma X, Zhang Q, Tian G, Yang J, Zhu Z (2017) On tchebycheff decomposition approaches for multiobjective evolutionary optimization. IEEE Trans Evolut Comput 22(2):226–244. https://doi.org/10.1109/TEVC.2017.2704118 [Online]
Das I, Dennis JE (1998) Normal-boundary intersection: a new method for generating the pareto surface in nonlinear multicriteria optimization problems. SIAM J Optim 8(3):631–657. https://doi.org/10.1137/S1052623496307510 [Online]
Shukla PK (2007) On the normal boundary intersection method for generation of efficient front. In: International Conference on Computational Science. Springer, pp. 310–317 [Online] https://doi.org/10.1007/978-3-540-72584-8_40
Sanchis J, Martínez M, Blasco X, Salcedo JV (2008) A new perspective on multiobjective optimization by enhanced normalized normal constraint method. Struct Multidiscip Optim 36(5):537–546. https://doi.org/10.1007/s00158-007-0185-4 [Online]
Sun S, Luo C, Chen J (2017) A review of natural language processing techniques for opinion mining systems. Inform Fus 36:10–25. https://doi.org/10.1016/j.inffus.2016.10.004 [Online]
Otter DW, Medina JR, Kalita JK (2020) A survey of the usages of deep learning for natural language processing. In: IEEE Transactions on Neural Networks and Learning Systems [Online]. https://doi.org/10.1109/TNNLS.2020.2979670
Navigli R, Velardi P (2005) Structural semantic interconnections: a knowledge-based approach to word sense disambiguation. IEEE Trans Pattern Anal Mach Intell 27(7):1075–1086. https://doi.org/10.1109/TPAMI.2005.149 [Online]
Budanitsky A, Hirst G (2006) Evaluating wordnet-based measures of lexical semantic relatedness. Comput Ling 32(1):13–47. https://doi.org/10.1162/coli.2006.32.1.13 [Online]
Mallick C, Dutta M, Das AK, Sarkar A, Das AK (2019) Extractive summarization of a document using lexical chains. In: Soft Computing in Data Analytics. Springer, pp. 825–836 [Online] https://doi.org/10.1007/978-981-13-0514-6_78
Kaikhah K (2004) Automatic text summarization with neural networks. In: Intelligent systems, 2004. Proceedings 2004 2nd International IEEE Conference, vol. 1, pp. 40–44 [Online] https://doi.org/10.1109/IS.2004.1344634
Patel M, Chokshi A, Vyas S, Maurya K (2018) Machine learning approach for automatic text summarization using neural networks. Int J Adv Res Comput Commun Eng 7(1) [Online] https://doi.org/10.17148/IJARCCE.2018.7132
Suleiman D, Awajan AA (2019) Deep learning based extractive text summarization: approaches, datasets and evaluation measures. In: 2019 Sixth International Conference on Social Networks Analysis, Management and Security (SNAMS). IEEE, pp. 204–210 [Online] https://doi.org/10.1109/SNAMS.2019.8931813
Li J, Li S (2013) A novel feature-based bayesian model for query focused multi-document summarization. Trans Assoc Comput Ling 1:89–98. https://doi.org/10.1162/tacl_a_00212 [Online]
Conroy JM, O’leary DP (2001) Text summarization via hidden markov models. In: Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 406–407. [Online]. https://doi.org/10.1145/383952.384042
Mendoza M, Bonilla S, Noguera C, Cobos C, León E (2014) Extractive single-document summarization based on genetic operators and guided local search. Expert Syst Appl 41(9):4158–4169. https://doi.org/10.1016/j.eswa.2013.12.042
McDonald R (2007) A study of global inference algorithms in multi-document summarization. In: European Conference on Information Retrieval. Springer, pp. 557–564 [Online] https://doi.org/10.1007/978-3-540-71496-5_51
Davis ST, Conroy JM, Schlesinger JD (2012) Occams–an optimal combinatorial covering algorithm for multi-document summarization. In: 2012 IEEE 12th International Conference on Data Mining Workshops. IEEE, pp. 454–463 [Online] https://doi.org/10.1109/ICDMW.2012.50
Plaza L, Díaz A, Gervás P (2011) A semantic graph-based approach to biomedical summarisation. Artif Intel Med 53(1):1–14. https://doi.org/10.1016/j.artmed.2011.06.005 [Online]
Moradi M, Ghadiri N (2018) Different approaches for identifying important concepts in probabilistic biomedical text summarization. Artif Intel Med 84:101–116. https://doi.org/10.1016/j.artmed.2017.11.004 [Online]
Afantenos S, Karkaletsis V, Stamatopoulos P (2005) Summarization from medical documents: a survey. Artif Intell Med 33(2):157–177. https://doi.org/10.1016/j.artmed.2004.07.017 [Online]
Mishra R, Bian J, Fiszman M, Weir CR, Jonnalagadda S, Mostafa J, Del Fiol G (2014) Text summarization in the biomedical domain: a systematic review of recent research. J Biomed Inform 52:457–467. https://doi.org/10.1016/j.jbi.2014.06.009 [Online]
Savova GK, Masanz JJ, Ogren PV, Zheng J, Sohn S, Kipper-Schuler KC, Chute CG (2010) Mayo clinical text analysis and knowledge extraction system (ctakes): architecture, component evaluation and applications. J Am Med Inform Assoc 17(5):507–513. https://doi.org/10.1136/jamia.2009.001560 [Online]
Rindflesch TC, Fiszman M (2003) The interaction of domain knowledge and linguistic structure in natural language processing: interpreting hypernymic propositions in biomedical text. J Biomed Inform 36(6):462–477. https://doi.org/10.1016/j.jbi.2003.11.003 [Online]
Aronson AR, Lang F-M (2010) An overview of metamap: historical perspective and recent advances. J Am Med Inform Assoc 17(3):229–236. https://doi.org/10.1136/jamia.2009.002733 [Online]
Wang Y, Wang L, Rastegar-Mojarad M, Moon S, Shen F, Afzal N, Liu S, Zeng Y, Mehrabi S, Sohn S et al (2018) Clinical information extraction applications: a literature review. J Biomed Inform 77:34–49. https://doi.org/10.1016/j.jbi.2017.11.011 [Online]
Yoo I, Hu X, Song I-Y (2007) A coherent graph-based semantic clustering and summarization approach for biomedical literature and a new summarization evaluation method. BMC Bioinform no. 9. BioMed Central, p. S4. [Online] https://doi.org/10.1186/1471-2105-8-S9-S4
Fiszman M, Demner-Fushman D, Kilicoglu H, Rindflesch TC (2009) Automatic summarization of medline citations for evidence-based medical treatment: A topic-oriented evaluation. J Biomed Inform 42(5):801–813. https://doi.org/10.1016/j.jbi.2008.10.002 [Online]
Mollá D, Santiago-Martínez ME, Sarker A, Paris C (2016) A corpus for research in text processing for evidence based medicine. Lang Resour Evaluat 50(4):705–727. https://doi.org/10.1007/s10579-015-9327-2 [Online]
Yongkiatpanich C, Wichadakul D (2019) Extractive text summarization using ontology and graph-based method. In: 2019 IEEE 4th International Conference on Computer and Communication Systems (ICCCS). IEEE, pp. 105–110 [Online] https://doi.org/10.1109/CCOMS.2019.8821755
Plaza L, Jimeno-Yepes AJ, Diaz A, Aronson AR (2011) Studying the correlation between different word sense disambiguation methods and summarization effectiveness in biomedical texts,” vol. 12, no. 1. Springer, p. 355 [Online] https://doi.org/10.1186/1471-2105-12-355
Reeve L, Han H, Brooks AD (2006) Biochain: lexical chaining methods for biomedical text summarization. In: Proceedings of the 2006 ACM symposium on Applied computing. ACM, pp. 180–184. [Online]. https://doi.org/10.1145/1141277.1141317
Reeve LH, Han H, Nagori SV, Yang JC, Schwimmer TA, Brooks AD (2006) Concept frequency distribution in biomedical text summarization. In: Proceedings of the 15th ACM international conference on Information and knowledge management. ACM, pp. 604–611. [Online]. https://doi.org/10.1145/1183614.1183701
Cohan A, Goharian N (2017) Contextualizing citations for scientific summarization using word embeddings and domain knowledge. In: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1133–1136. [Online] https://doi.org/10.1145/3077136.3080740
Das AK, Sil J (2011) An efficient classifier design integrating rough set and set oriented database operations. Appl Soft Comput 11(2):2279–2285. https://doi.org/10.1016/j.asoc.2010.08.008 [Online]
Galar M, Fernandez A, Barrenechea E, Bustince H, Herrera F (2012) A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches. IEEE Trans Syst Man Cybern Part C (Appl Rev) 42(4):463–484. https://doi.org/10.1109/TSMCC.2011.2161285 [Online]
Zou D, Wu J, Gao L, Li S (2013) A modified differential evolution algorithm for unconstrained optimization problems. Neurocomputing 120:469–481. https://doi.org/10.1016/j.neucom.2013.04.036 [Online]
Li M, Chen H, Wang X, Zhong N, Lu S et al (2019) An improved particle swarm optimization algorithm with adaptive inertia weights. Int J Inform Technol Decis Making (IJITDM) 18(03):833–866. https://doi.org/10.1142/S0219622019500147
Qin AK, Huang VL, Suganthan PN (2009) Differential evolution algorithm with strategy adaptation for global numerical optimization. IEEE Trans Evolut Comput 13(2):398–417. https://doi.org/10.1109/TEVC.2008.927706 [Online]
Li H, Zhang Q (2009) Multiobjective optimization problems with complicated pareto sets, moea/d and nsga-ii. IEEE Trans Evolut Comput 13(2):284–302. https://doi.org/10.1109/TEVC.2008.925798 [Online]
Sivasubramani S, Swarup K (2011) Multi-objective harmony search algorithm for optimal power flow problem. Int J Elect Power Energy Syst 33(3):745–752. https://doi.org/10.1016/j.ijepes.2010.12.031 [Online]
Zhang Q, Li H (2007) Moea/d: a multiobjective evolutionary algorithm based on decomposition. IEEE Trans Evolut Comput 11(6):712–731. https://doi.org/10.1109/TEVC.2007.892759 [Online]
Loper E, Bird S (2002) Nltk: the natural language toolkit,” arXiv preprint cs/0205028, [Online]. https://doi.org/10.3115/1118108.1118117
Goswami S, Murthy C, Das AK (2018) Sparsity measure of a network graph: Gini index. Inform Sci 462:16–39. https://doi.org/10.1016/j.ins.2018.05.044 [Online]
Zonoobi D, Kassim AA, Venkatesh YV (2011) Gini index as sparsity measure for signal reconstruction from compressive samples. IEEE J Select Top Signal Process 5(5):927–932. https://doi.org/10.1109/JSTSP.2011.2160711 [Online]
Neto JL, Freitas AA, Kaestner CA (2002) Automatic text summarization using a machine learning approach. In Brazilian symposium on artificial intelligence. Springer, pp. 205–215. [Online]. https://doi.org/10.1007/3-540-36127-8_20
Zajic DM, Dorr BJ, Lin J (2008) Single-document and multi-document summarization techniques for email threads using sentence compression. Inform Process Manag 44(4):1600–1610. https://doi.org/10.1016/j.ipm.2007.09.007 [Online]
Yousefi-Azar M, Hamey L (2017) Text summarization using unsupervised deep learning. Expert Syst Appl 68:93–105. https://doi.org/10.1016/j.eswa.2016.10.017 [Online]
Bird S (2006) Nltk: the natural language toolkit steven. In Proceedings of the COLING/ACL on Interactive presentation sessions, pp. 69–72. [Online]. https://doi.org/10.3115/1225403.1225421
He T, Chen J, Ma L, Gui Z, Li F, Shao W, Wang Q (2008) Rouge-c: a fully automated evaluation method for multi-document summarization. In: 2008 IEEE International Conference on Granular Computing, pp. 269–274. [Online]. https://doi.org/10.1109/GRC.2008.4664680
Reeve LH, Han H, Brooks AD (2007) The use of domain-specific concepts in biomedical text summarization. Inform Process Manag 43(6):1765–1776. https://doi.org/10.1016/j.ipm.2007.01.026 [Online]
Thalhammer A, Stadtmüller S (2015) Summa: a common api for linked data entity summaries. Int Conf Web Eng Springer, pp. 430–446. [Online]. https://doi.org/10.1007/978-3-319-19890-3_28
Nandhini K, Balasundaram SR (2013) Use of genetic algorithm for cohesive summary extraction to assist reading difficulties. Appl Comput Intell Soft Comput 2013, [Online]. https://doi.org/10.1155/2013/945623
Kanakaraj M, Guddeti RMR (2015) Nlp based sentiment analysis on twitter data using ensemble classifiers. In: 2015 3rd International Conference on Signal Processing, Communication and Networking (ICSCN). IEEE, pp. 1–5. [Online]. https://doi.org/10.1109/ICSCN.2015.7219856
Ingason AK, Helgadóttir S, Loftsson H, Rögnvaldsson E (2008) A mixed method lemmatization algorithm using a hierarchy of linguistic identities (holi). In: International Conference on Natural Language Processing. Springer, pp. 205–216. [Online]. https://doi.org/10.1007/978-3-540-85287-2_20
Foody GM (2002) Status of land cover classification accuracy assessment. Remote Sens Environ 80(1):185–201. https://doi.org/10.1016/S0034-4257(01)00295-4 [Online]
Acknowledgements
We did not receive any Grant for this research work specifically from any funding agencies.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
There are no conflict of interest from authors’ side.
Rights and permissions
About this article
Cite this article
Mallick, C., Das, A.K., Nayak, J. et al. Evolutionary Algorithm based Ensemble Extractive Summarization for Developing Smart Medical System. Interdiscip Sci Comput Life Sci 13, 229–259 (2021). https://doi.org/10.1007/s12539-020-00412-5
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12539-020-00412-5