Skip to main content
Log in

Study of automatic text summarization approaches in different languages

  • Published:
Artificial Intelligence Review Aims and scope Submit manuscript

Abstract

Nowadays we see huge amount of information is available on both, online and offline sources. For single topic we see hundreds of articles are available, containing vast amount of information about it. It is really a difficult task to manually extract the useful information from them. To solve this problem, automatic text summarization systems are developed. Text summarization is a process of extracting useful information from large documents and compressing them into short summary preserving all important content. This survey paper hand out a broad overview on the work done in the field of automatic text summarization in different languages using various text summarization approaches. The focal centre of this survey paper is to present the research done on text summarization on Indian languages such as, Hindi, Punjabi, Bengali, Malayalam, Kannada, Tamil, Marathi, Assamese, Konkani, Nepali, Odia, Sanskrit, Sindhi, Telugu and Gujarati and foreign languages such as Arabic, Chinese, Greek, Persian, Turkish, Spanish, Czeh, Rome, Urdu, Indonesia Bhasha and many more. This paper provides the knowledge and useful support to the beginner scientists in this research area by giving a concise view on various feature extraction methods and classification techniques required for different types of text summarization approaches applied on both Indian and non-Indian languages.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  • Ali M, Wagan AI (2017) Sentiment summerization and analysis of Sindhi text. Int J Adv Comp Sci Appl, pp 296–300.

  • Azmi AM, Al-Thanyyan S (2012) A text summarizer for Arabic. Computer Speech Lang, pp 260–273.

  • Bahloul B, Aliane H, Benmohammed M (2019) ArA*summarizer: An Arabic text summarization system based on subtopic segmentation and using an A* algorithm for reduction. Wiley Expert systems, New York, pp 1–16.

  • Balabantaray RC, Sahoo B, Sahoo DK, Swain M (2012) Odia text summarization using stemmer. Int J Appl Inf Syst (IJAIS), pp 21–24.

  • Baotian H, Qingcai C, Fangze Z (2015) Lcsts: a large scale Chinese short text summarization dataset. arXiv preprint arXiv, pp 1–6.

  • Baralis E, Cagliero L, Mahoto N, Fiori A (2013) GRAPHSUM: discovering correlations among multiple terms for graph-based summarization. Inf Sci, pp 96–109.

  • Baruah N, Sarma S, Borkotokey S (2019) Text summarization in Indian languages: a critical review. In: IEEE second international conference on advanced computational and communication paradigms (ICACCP), pp 1–6

  • Baruah N, Sarma SK, Borkotokey S (2020) Evaluation of content compaction in Assamese language. Third international conference on computing and network communications (CoCoNet’19), pp 2275–2284.

  • Bashir M, Rozaimee A, Wan M, Isa W (2017) Automatic Hausa language text summarization based on feature extraction using Naïve Bayes model. World Appl Sci J 35(9):2074–2080

    Google Scholar 

  • Belkebir R, Guessoum A (2015) A supervised approach to Arabic text summarization using adaboost. In: Springer New contributions in information systems and technologies, pp 227–236

  • Berenjkoob M, Mehri R, Khosravi H, Nematbakhsh MA (2009) A method for stemming and eliminating common words for Persian text summarization. In: IEEE International conference on natural language processing and knowledge engineering, pp 1–6

  • Bhatia N, Jaiswal A (2016) Automatic text summarization and it's methods-a review. In: IEEE 6th international conference-cloud system and big data engineering (Confluence), pp 65–72

  • Biswas S, Acharya S, Dash S (2015) Automatic text summarization for Oriya language. Int J Comp Appl, pp 19–26.

  • Bois R, Levelling J, Goeuriot L, Jones GJF, Kelly L (2014) Porting a summarizer to the French language. 21ème Traitement Automatique des Langues Naturelles, Marseille, pp 550–555.

  • Breem SN, Baraka RS (2017) Automatic arabic text summarization for large scale multiple documents using genetic algorithm and MapReduce. In: Palestinian International Conference on Information and Communication Technology, pp 40–45.

  • Burney A, Sami B, Mahmood N, Abbas Z, Rizwan K (2012) Urdu text summarizer using sentence weight algorithm for word processors. Int J Comp Appl, pp 38–43

  • Cigir C, Kutlu M, Cicekli I (2009) Generic text summarization for Turkish. In: IEEE 24th International symposium on computer and information sciences, pp 224–229

  • Cunha ID, Juan ES, Torres-Moreno J-M, Cabre MT, Sierra G (2012) A symbolic approach for automatic detection of nuclearity and rhetorical relations among intra-sentence discourse segments in Spanish. In: International conference on intelligent text processing and computational linguistics. Springer, Heidelberg, pp 462–475.

  • D’Silva, J, Sharma U (2020) Unsupervised automatic text summarization of Konkani texts using K-means with Elbow method. Int J Eng Res Technol, pp 2380–2384.

  • Dalal V, Malik L (2017) Data clustering approach for automatic text summarization of Hindi documents using particle swarm optimization and semantic graph. In: International Journal of Soft Computing and Engineering (IJSCE), pp 1–3.

  • Das A, Bandyopadhyay S (2010) Topic-based Bengali opinion summarization. In: Proceedings of the 23rd international conference on computational linguistics: posters, pp 232–240

  • Eddy BP, Robinson JA, Kraft NA, Carver JC (2013) Evaluating source code summarization techniques: replication and expansion. In: 21st International conference on program comprehension (ICPC), IEEE, pp 13–22.

  • Eduard H, Chin-Yew L (1998) Automated text summarization and the SUMMARIST system. workshop on held at Baltimore. Maryland, Association for Computational Linguistics, pp 197–214

  • Fachrurrozi M, Yusliani N, Yoanita RU (2013) Frequent term-based text summarization for bahasa Indonesia. In: International Cconference on innovations in engineering and technology (ICIET'), pp 30–32

  • Fejer HN, Omar N (2014) Automatic Arabic text summarization using clustering and keyphrase extraction. In: International conference on information technology and multimedia (ICIMU), pp 293–298.

  • Florescu C, Jin W (2019) A supervised keyphrase extraction system based on graph representation learning. European conference on information retrieval, pp 197–212.

  • Fowkes J, Chanthirasegaran P, Ranca R, Allamanis M, Lapata M, Sutton C (2017) Autofolding for source code summarization. IEEE Trans Softw Eng, pp 1095–1109

  • Geetha JK, (2015) Kannada text summarization using Latent Semantic Analysis. In: IEEE International conference on advances in computing, communications and informatics (ICACCI), pp 1508–1512

  • Gulati AN, Sawarkar SD (2017) A novel technique for multi-document Hindi text summarization. In: IEEE International conference on Nascent technologies in engineering (ICNTE), pp 1–6

  • Gupta V, Singh GL (2012) Automatic Punjabi text extractive summarization system. In: Proceedings of COLING: Demonstration Papers, pp 191–198

  • Gupta V, Singh GL (2013) Automatic text summarization system for Punjabi language. J Emerg Technol Web Intell, pp 257–271

  • Haiduc S, Aponte J, Marcus A (2010) Supporting program comprehension with source code summarization. In: Proceedings of the 32nd ACM/IEEE international conference on software engineering. ACM, New York, pp 223–226.

  • Hammad M, Abuljadayel A, Khalaf M (2016) Summarizing services of java packages. Lecture Notes on Software Engineering, pp 129–132.

  • Haroon RP (2015) An extractive Malayalam document summarization based on graph theoretic approach. In: IEEE Fifth international conference on e-Learning (econf), pp 237–240

  • Hassel M, Dalianis H (2012) Portable text summarization. In: Applied natural language processing: identification, investigation and resolution, pp 17–32

  • Hendrickx I, Daelemans W, Marsi E, Krahmer E (2009) Reducing redundancy in multi-document summarization using lexical semantic similarity. In: Proceedings of the 2009 Workshop on language generation and summarisation, ACL-IJCNLP, pp 63–66.

  • Hidayat EY, Firdausillah F, Hastuti K, Ika ND, Azhari (2015) Automatic text summarization using latent drichlet allocation (LDA) for document clustering. Int J Adv Intell Informatics, pp 132–139

  • Hu P, Tingting H, Donghong J, Meng W (2004) A study of Chinese text summarization using adaptive clustering of paragraphs. In: IEEE Fourth international conference on computer and information technology, pp 1159–1164

  • Humayoun M, Hwanjo Y (2016) Analyzing pre-processing settings for Urdu single-document extractive summarization. In: Proceedings of the tenth international conference on language resources and evaluation (LREC), pp. 3686–3693

  • Jassem K, Pawluczuk L (2015) Automatic summarization of Polish news articles by sentence selection. Federated Conference on Computer Science And Information Systems, pp 1–5.

  • Jayashree, Murthy KS (2011) An analysis of sentence level text classification for the Kannada language. In: IEEE International conference of soft computing and pattern recognition (SoCPaR), pp. 147–151

  • Jeong H (2013) Efficient keyword extraction and text summarization for reading articles on a smart phone. Comput Informatics, pp 1001–1016.

  • Kabeer R, Idicula MS (2014) Text summarization for Malayalam documents-an experience. In: IEEE International conference on data science & engineering (ICDSE), pp 145–150

  • Kamimura M, Murphy GC (2013) Towards generating human-oriented summaries of unit test cases. In: 21st International conference on program comprehension (ICPC), IEEE, pp 215–218.

  • Khan A, Naomie S (2014) A review on abstractive summarization methods. J Theor Appl Inf Technol, pp 64–72

  • Kopeć M (2019) Three-step coreference-based summarizer for Polish news texts. Poznań Studies in Contemporary Linguistics, pp 397–443.

  • Kutlu M, Cıgır C, Cicekli I (2010) Generic text summarization for Turkish. Comp J, pp 1315–1323

  • Lagrini S, Redjimi M, Azizi N (2017) Automatic arabic text summarization approaches. Int J Computer Appl, pp 31–37

  • Lee D, Shin M, Whang T, Cho S, Ko B, Lee D, Kim E, Jo J (2020) Reference and Document Aware Semantic Evaluation Methods for Korean Language Summarization. pp 1–13

  • Lehto N, Sjodin M (2019) Automatic text summarization of Swedish news articles. Eng Tech 1–12

  • Liu W, Wang L (2017) Efficient Korean text summarization based on key phrase extraction. In: International conference on machine learning and cybernetics, pp 61–66.

  • Maaloul MH, keskes I, Belguith LH, Blache P (2010) Automatic summarization of Arabic texts based on RST technique. In: Proceedings of the 12th international conference on enterprise information systems, pp 1–7.

  • Malamos AG, Ware MGJA (2005) Applying statistic-based algorithms for automated content summarization in Greek language, Jaoua, Ben, pp 1–8

  • Mao X, Yang H, Huanga S, Liua Y, Li R (2019) Extractive summarization using supervised and unsupervised learning. Expert systems with applications, pp 173–181, 2019.

  • Mehrnoush S, Tara A, Erfani JM (2009) Parsumist: a Persian text summarizer. In: IEEE International conference on natural language processing and knowledge engineering, pp 1–7.

  • Mihalcea R, Tarau P (2004) TextRank: bringing order into texts. Empirical methods in natural language processing (EMNLP). Barcelona, Spain, pp 404–411.

  • Mohamed M, Oussalah M (2019) SRL-ESA-TextSum: a text summarization approach based on semantic role labeling and explicit semantic analysis. Information Processing & Management, pp 1356–1372.

  • Mohan MJ, Sunitha C, Ganesha A, Jaya A (2016) A study on ontology based abstractivesSummarization. Procedia Computer Science, pp 32–37.

  • Moratanch N, Chitrakala S (2017) A survey on extractive text summarization. In: 2017 International conference on computer, communication and signal processing (ICCCSP), pp 1–6.

  • Movshovitz-Attias D, Cohen WW (2013) Natural language models for predicting programming comments. In: Proceedings of the 51st annual meeting of the association for computational linguistics, pp 35–40.

  • Nagaprasad S, Vijayapal Reddy P, Vinaya Babu A (2015) Authorship Attribution based on Data Compression for Telugu Text. Int J Comput Appl 110(1):1–5

    Google Scholar 

  • Nallapati R, Zhai F, Zhou B (2017) SummaRuNNer: a recurrent neural network based sequence model for extractive summarization of documents. Thirty-First aaai conference on artificial intelligence (AAAI-17), pp 3075–3081.

  • Nathani B, Joshi N, Purohit GN (2020) Design and development of unsupervised Stemmer for Sindhi language. In: International Conference on Computational Intelligence and Data Science (ICCIDS), pp. 1920–1927.

  • Ozsoy MG, Cicekli I, Alpaslan FN (2010) Text summarization of Turkish texts using latent semantic analysis. In: Proceedings of the 23rd International conference on computational linguistics, pp 869–876.

  • Parida S, Motlicek P (2019) Idiap abstract text summarization system for German text summarization task. SwissTex, pp 1–5.

  • Parveen D, Mesgar M, Strube M (2016) Generating coherent summaries of scientific articles using coherence patterns. Empirical methods in natural language processing, Texas: Austin, pp 772–783.

  • Periantu MS, Djoko BS (2017) Summarizing Indonesian text automatically by using sentence scoring and decision tree. In: IEEE 2nd International conferences on information technology, information systems and electrical engineering (ICITISEE), pp 1–6

  • Pontes EL, Huet S, Torres-Moreno J-M, Linhares AC (2018) Cross-language text summarization using sentence and multi-sentence compression. Natural Language Processing and Information Systems, pp 467–479.

  • Prasad, SN, Narsimha, VB, Reddy, PV, Babu, AV (2015) Influence of lexical, syntactic and structural features and their combination on Authorship Attribution for Telugu Text. In: International conference on intelligent computing, communication & convergence, pp 58–64.

  • Qassem LMA, Wanga D, Barada H, Rubaiea AA, Moosaa NA (2019) Automatic Arabic text summarization based on fuzzy logic. In: Proceedings of the 3rd international conference on natural language and speech processing, pp 42–48.

  • Raj MR, Haroon RP (2016) Malayalam text summarization: minimum spanning tree-based graph reduction approach. In: IEEE 2nd International conference on advances in computing, communication, & automation (ICACCA) (Fall), pp 1–5

  • Ramanujam, N, Kaliappan, M (2016) An automatic multidocument text summarization approach based on Naive Bayesian classifier using timestamp strategy. Sci World J, pp 1–11

  • Ranabhat R, Upreti A, Sangpang B, Manandhar S (2019) Salient sentence extraction of Nepali online health news texts. Int J Adv Soc Sci, pp 21–26.

  • Ren M, Kang S (2018) Korean news text summarizer enriched with major information items. Int J Adv Sci Technol, pp 115–126.

  • Rodeghero P, McMillan C, McBurney PW, Bosch N, Mello SD (2014) Improving automated source code summarization via an eye-tracking study of programmers. In:Proceedings of the 36th international conference on Software engineering, pp 390–401.

  • Rodrigues S, Fernandes S, Pai A (2019) Konkani text summarization by sentence extraction. In: 10th International conference on computing, communication and networking technologies (ICCCNT), pp 1–6.

  • Saggion H, Poibeau T (2013) Automatic text summarization: past, present and future. In: Multi-source, multilingual information extraction and summarization. Springer, Heidelberg, pp 3–21

  • Sahoo D, Balabantaray R, Phukon M, Saikia S (2016) Aspect based multi-document dummarization. In: International conference on computing, communication and automation (ICCCA2016), pp 873–877.

  • Sakhare DY, Kumar R (2016) Syntactical knowledge and Sanskrit memamsa principle based hybrid approach for text summarization. Int J Comp Sci Inf Security (IJCSIS), pp 270–275.

  • Sarwadnya VV, Sonawane SS (2018) Marathi extractive text summarizer using graph based model. In: IEEE Fourth international conference on computing communication control and automation (ICCUBEA), pp 1–6

  • Shah P, Desai N (2016) A survey of automatic text summarization techniques for Indian and foreign languages. IEEE International conference on electrical, electronics, and optimization techniques (ICEEOT), pp 4598–4601

  • Shimpikar S, Govilkar S (2017) A survey of text summarization techniques for Indian regional languages. int J Comp Appl, pp. 29–33

  • Straka M, Mediankin N, Kocmi T, Zabokrtsky Z, Hudecek V, Ha J (2018) SumeCzech: large Czech News-based summarization dataset. In: Proceedings of the eleventh international conference on language resources and evaluation (LREC), pp 3488–3495.

  • Sudha, DN, Latha YM (2020) Multi-document abstractive text summarization through semantic similarity matrix for Telugu language. Int J Adv Sci Technol, pp 513–521.

  • Syed SM, Shanmugasundaram H (2017) An investigation on graphical approach for tamil text summary generation. In: IEEE International conference on intelligent computing and control (I2C2), pp 1–5

  • Uddin MN, Khan SA (2007) A study on text summarization techniques and implement few of them for Bangla language. In: IEEE 10th international conference on computer and information technology, pp 1–4

  • Umadevi KS, Chopra R, Singh N, Aruru L, Kannan RJ (2018) Text summarization of Spanish documents. In: International conference on advances in computing, communications and informatics (ICACCI), pp 1793–1797.

  • Vijay S, Rai V, Gupta S, Vijayvargia A, Sharma MD (2017) Extractive text summarisation in hindi. In: IEEE International conference on Asian language processing (IALP), pp 318–32

  • Widyassari PA, Affandy NE, Fanani AZ, Syukur A, Basuki RS (2019) Literature review of automatic text summarization: research trend, dataset and method. In: IEEE International conference on information and communications technology (ICOIACT), pp 491–496.

  • Yu H, Kaufman YJ, Chin M, Feingold G, Remer LA, Anderson TL, Balkanski Y, Bellouin N, Boucher O, Christopher S, DeCola P, Kahn R, Koch D, Loeb N, Reddy MS, Schulz M, Takemura T, Zhou M (2006) A review of measurement-based assessments of the aerosol direct radiative effect and forcing. Atmos Chem Phys 6(3):613–666

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yogesh Kumar.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kumar, Y., Kaur, K. & Kaur, S. Study of automatic text summarization approaches in different languages. Artif Intell Rev 54, 5897–5929 (2021). https://doi.org/10.1007/s10462-021-09964-4

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10462-021-09964-4

Keywords

Navigation