Robust Arabic Text Categorization by Combining Convolutional and Recurrent Neural Networks,ACM Transactions on Asian and Low-Resource Language Information Processing

当前位置： X-MOL 学术 › ACM Trans. Asian Low Resour. Lang. Inf. Process. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Robust Arabic Text Categorization by Combining Convolutional and Recurrent Neural Networks
ACM Transactions on Asian and Low-Resource Language Information Processing ( IF 1.8 ) Pub Date : 2020-07-01 , DOI: 10.1145/3390092
Mohamed Seghir Hadj Ameur ₁ , Riadh Belkebir ₁ , Ahmed Guessoum ₁

Affiliation

Text Categorization is an important task in the area of Natural Language Processing (NLP). Its goal is to learn a model that can accurately classify any textual document for a given language into one of a set of predefined categories. In the context of the Arabic language, several approaches have been proposed to tackle this problem, many of which are based on the bag-of-words assumption. Even though these methods usually produce good results for the classification task, they often fail to capture contextual dependencies from textual data. On the other hand, deep learning architectures that are usually based on Recurrent Neural Networks (RNNs) or Convolutional Neural Networks (CNNs) do not suffer from such a limitation and have recently shown very promising results in various NLP applications. In this work, we use deep learning models that combine RNN and CNN for the task of Arabic text categorization using static, dynamic, and fine-tuned word embeddings. The experimental results reported on the Open Source Arabic Corpora (OSAC) dataset have shown the effectiveness and high performance of our proposed models.

中文翻译：

结合卷积和循环神经网络的鲁棒阿拉伯语文本分类

文本分类是自然语言处理（NLP）领域的一项重要任务。它的目标是学习一个模型，该模型可以准确地将给定语言的任何文本文档分类为一组预定义类别中的一个。在阿拉伯语的背景下，已经提出了几种方法来解决这个问题，其中许多是基于词袋假设的。尽管这些方法通常为分类任务产生良好的结果，但它们通常无法从文本数据中捕获上下文相关性。另一方面，通常基于递归神经网络 (RNN) 或卷积神经网络 (CNN) 的深度学习架构不受这种限制，并且最近在各种 NLP 应用中显示出非常有希望的结果。在这项工作中，我们使用结合了 RNN 和 CNN 的深度学习模型来完成使用静态、动态和微调词嵌入的阿拉伯语文本分类任务。在开源阿拉伯语语料库（OSAC）数据集上报告的实验结果表明了我们提出的模型的有效性和高性能。

更新日期：2020-07-01

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11