当前位置: X-MOL 学术Artif. Intell. Rev. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Study of automatic text summarization approaches in different languages
Artificial Intelligence Review ( IF 10.7 ) Pub Date : 2021-02-12 , DOI: 10.1007/s10462-021-09964-4
Yogesh Kumar , Komalpreet Kaur , Sukhpreet Kaur

Nowadays we see huge amount of information is available on both, online and offline sources. For single topic we see hundreds of articles are available, containing vast amount of information about it. It is really a difficult task to manually extract the useful information from them. To solve this problem, automatic text summarization systems are developed. Text summarization is a process of extracting useful information from large documents and compressing them into short summary preserving all important content. This survey paper hand out a broad overview on the work done in the field of automatic text summarization in different languages using various text summarization approaches. The focal centre of this survey paper is to present the research done on text summarization on Indian languages such as, Hindi, Punjabi, Bengali, Malayalam, Kannada, Tamil, Marathi, Assamese, Konkani, Nepali, Odia, Sanskrit, Sindhi, Telugu and Gujarati and foreign languages such as Arabic, Chinese, Greek, Persian, Turkish, Spanish, Czeh, Rome, Urdu, Indonesia Bhasha and many more. This paper provides the knowledge and useful support to the beginner scientists in this research area by giving a concise view on various feature extraction methods and classification techniques required for different types of text summarization approaches applied on both Indian and non-Indian languages.



中文翻译:

研究不同语言的自动文本摘要方法

如今,我们看到在线和离线资源上都有大量信息可用。对于单个主题,我们看到数百篇文章可用,其中包含有关此主题的大量信息。手动从中提取有用的信息确实是一项艰巨的任务。为了解决这个问题,开发了自动文本摘要系统。文本摘要是从大型文档中提取有用信息并将其压缩为简短摘要以保留所有重要内容的过程。这份调查报告对使用各种文本摘要方法在不同语言中自动文本摘要领域中所做的工作进行了广泛概述。本调查论文的重点是介绍有关印度语(印度语,旁遮普语,孟加拉语,马拉雅拉姆语,卡纳达语,泰米尔语,马拉地语,阿萨姆语,康卡尼语,尼泊尔语,奥迪亚语,梵语,信德语,泰卢固语和古吉拉特语以及外语,例如阿拉伯语,中文,希腊语,波斯语,土耳其语,西班牙语,Czeh,罗马,乌尔都语,印度尼西亚语Bhasha等。本文简要介绍了针对印度和非印度语言的不同类型的文本摘要方法所需的各种特征提取方法和分类技术,为该研究领域的初学者提供了知识和有用的支持。

更新日期:2021-02-12
down
wechat
bug