当前位置: X-MOL 学术J. Big Data › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Arabic text summarization using deep learning approach
Journal of Big Data ( IF 8.1 ) Pub Date : 2020-12-11 , DOI: 10.1186/s40537-020-00386-7
Molham Al-Maleh , Said Desouki

Natural language processing has witnessed remarkable progress with the advent of deep learning techniques. Text summarization, along other tasks like text translation and sentiment analysis, used deep neural network models to enhance results. The new methods of text summarization are subject to a sequence-to-sequence framework of encoder–decoder model, which is composed of neural networks trained jointly on both input and output. Deep neural networks take advantage of big datasets to improve their results. These networks are supported by the attention mechanism, which can deal with long texts more efficiently by identifying focus points in the text. They are also supported by the copy mechanism that allows the model to copy words from the source to the summary directly. In this research, we are re-implementing the basic summarization model that applies the sequence-to-sequence framework on the Arabic language, which has not witnessed the employment of this model in the text summarization before. Initially, we build an Arabic data set of summarized article headlines. This data set consists of approximately 300 thousand entries, each consisting of an article introduction and the headline corresponding to this introduction. We then apply baseline summarization models to the previous data set and compare the results using the ROUGE scale.



中文翻译:

使用深度学习方法的阿拉伯语文本摘要

随着深度学习技术的出现,自然语言处理取得了显着进步。文本摘要以及其他任务(例如文本翻译和情感分析)使用了深度神经网络模型来增强结果。文本摘要的新方法受编码器-解码器模型的序列到序列框架的约束,该框架由在输入和输出上共同训练的神经网络组成。深度神经网络利用大型数据集来改善其结果。这些网络由注意力机制支持,该机制可以通过识别文本中的焦点来更有效地处理长文本。复制机制还支持它们,该机制允许模型将单词直接从源复制到摘要。在这项研究中 我们正在重新实现在阿拉伯语上应用按序排列框架的基本摘要模型,该模型以前在文本摘要中还没有目睹过该模型的使用。最初,我们建立摘要标题的阿拉伯数据集。该数据集包含大约30万个条目,每个条目包括文章简介和与该简介相对应的标题。然后,我们将基线汇总模型应用于先前的数据集,并使用ROUGE量表比较结果。每篇文章均由文章简介和与之对应的标题组成。然后,我们将基线汇总模型应用于先前的数据集,并使用ROUGE量表比较结果。每篇文章均由文章简介和与之对应的标题组成。然后,我们将基线汇总模型应用于先前的数据集,并使用ROUGE量表比较结果。

更新日期:2020-12-11
down
wechat
bug