当前位置: X-MOL 学术Cognit. Comput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Automatic Arabic Text Summarization Using Analogical Proportions
Cognitive Computation ( IF 5.4 ) Pub Date : 2020-08-19 , DOI: 10.1007/s12559-020-09748-y
Bilel Elayeb , Amina Chouigui , Myriam Bounhas , Oussama Ben Khiroun

Automatic text summarization is the process of generating or extracting a brief representation of an input text. There are several algorithms for extractive summarization in the literature tested by using English and other languages datasets; however, only few extractive Arabic summarizers exist due to the lack of large collection in Arabic language. This paper proposes and assesses new extractive single-document summarization approaches based on analogical proportions which are statements of the form “a is to b as c is to d”. The goal is to study the capability of analogical proportions to represent the relationship between documents and their corresponding summaries. For this purpose, we suggest two algorithms to quantify the relevance/irrelevance of an extracted keyword from the input text, to build its summary. In the first algorithm, the analogical proportion representing this relationship is limited to check the existence/non-existence of the keyword in any document or summary in a binary way without considering keyword frequency in the text, whereas the analogical proportion of the second algorithm considers this frequency. We have assessed and compared these two algorithms with some language-independent summarizers (LexRank, TextRank, Luhn and LSA (Latent Semantic Analysis)) using our large corpus ANT (Arabic News Texts) and a small test collection EASC (Essex Arabic Summaries Corpus) by computing ROUGE (Recall-Oriented Understudy for Gisting Evaluation) and BLEU (BiLingual Evaluation Understudy) metrics. The best-achieved results are ROUGE-1 = 0.96 and BLEU-1 = 0.65 corresponding to educational documents from EASC collection which outperform the best LexRank algorithm. The proposed algorithms are also compared with three other Arabic extractive summarizers, using EASC collection, and show better results in terms of ROUGE-1 = 0.75 and BLEU-1 = 0.47 for the first algorithm, and ROUGE-1 = 0.74 and BLEU-1 = 0.49 for the second one. Experimental results show the interest of analogical proportions for text summarization. In particular, analogical summarizers significantly outperform three among four language-independent summarizers in the case of BLEU-1 for ANT collection and they are not significantly outperformed by any other summarizer in the case of EASC collection.

中文翻译:

使用类比比例自动总结阿拉伯文字

自动文本摘要是生成或提取输入文本的简短表示的过程。使用英语和其他语言数据集测试的文献中有几种提取摘要的算法;但是,由于缺乏大量的阿拉伯语语言,所以仅有少量的阿拉伯语摘要提要存在。本文提出并评估了基于类比的新的提取性单文档摘要方法,这些方法的陈述形式为“ a is to b as c is d”。目的是研究类比比例表示文档及其相应摘要之间关系的能力。为此,我们建议使用两种算法来量化从输入文本中提取的关键字的相关性/不相关性,以建立其摘要。在第一种算法中,表示这种关系的类比比例被限制为以二进制方式检查任何文档或摘要中关键字的存在/不存在,而不考虑文本中的关键字频率,而在第二种算法中,类比比例考虑这个频率。我们已经将这两种算法与一些独立于语言的汇总器(LexRank,TextRank,Luhn和LSA(潜在语义分析)使用我们的大型语料库ANT(阿拉伯新闻文本)和小型测试集EASC(艾塞克斯阿拉伯语摘要语料库),方法是计算ROUGE(面向回忆评估的回忆研究)和BLEU(双语评估研究)。指标。最佳结果是ROUGE-1 = 0.96和BLEU-1 = 0.65,对应于EASC集合中的教育文献,其性能优于最佳LexRank算法。拟议的算法还与其他三个使用EASC收集的阿拉伯文摘要摘要进行了比较,并且在第一种算法的ROUGE-1 = 0.75和BLEU-1 = 0.47以及ROUGE-1 = 0.74和BLEU-1方面显示了更好的结果=第二个为0.49。实验结果表明类比比例对于文本摘要的兴趣。尤其是,
更新日期:2020-08-19
down
wechat
bug