当前位置: X-MOL 学术Arab. J. Sci. Eng. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
BLSTM-API: Bi-LSTM Recurrent Neural Network-Based Approach for Arabic Paraphrase Identification
Arabian Journal for Science and Engineering ( IF 2.9 ) Pub Date : 2021-02-24 , DOI: 10.1007/s13369-020-05320-w
Adnen Mahmoud , Mounir Zrigui

Advances in communication technologies have enabled peoples to deliver more. Due to this phenomenon, an increasing amount of data are easily disseminated and published on the internet, which encouraged the practice of paraphrasing. It allows the original sentence to be concealed by alternative expressions of the same meaning. Its detection consists in identifying the degree of semantic similarity between them. It is one of the complex tasks of automatic natural language processing and artificial intelligence. Despite the fact that Arabic language is spoken by a large population around the world, it is rich of grammars and semantics that made hard its sentences modeling and similarity computing. In this paper, an Arabic extrinsic paraphrase identification method is proposed. It is based on a Siamese recurrent neural networks architecture seeing its performance in processing variable size of textual sequences. Indeed, pertinent features are firstly extracted using global word vector that used a global co-occurrence matrix based on a local context window. Then, bidirectional long short-term memory is introduced that incorporated efficiently long-term dependent relationships and captured meaningful contextual semantics between words. For paraphrase identification, cosine measure is used as a merge function. It was useful for identifying semantic similarity between the obtained source and suspect vectors. To address the lack of free and publicly Arabic paraphrased datasets, word2vec algorithm and part-of-speech tagging are combined to generate suspect sentences. For its validation, its quality is compared to the SemEval benchmark. Experiments demonstrated the effectiveness of our proposal’s methods.



中文翻译:

BLSTM-API:基于Bi-LSTM递归神经网络的阿拉伯语释义识别方法

通信技术的进步使人们能够提供更多的服务。由于这种现象,越来越多的数据很容易在互联网上分发和发布,这鼓励了释义的实践。它允许原始句子被具有相同含义的替代表达所掩盖。它的检测在于识别它们之间的语义相似度。这是自动自然语言处理和人工智能的复杂任务之一。尽管世界上有很多人说阿拉伯语,但它丰富的语法和语义使句子建模和相似度计算变得困难。本文提出了一种阿拉伯语外在释义的识别方法。它基于Siamese递归神经网络体系结构,在处理可变大小的文本序列中表现出出色的性能。实际上,首先使用基于局部上下文窗口使用全局共现矩阵的全局词向量提取相关特征。然后,引入双向长期短期记忆,该双向长期短期记忆有效地结合了长期依赖关系并捕获了词之间的有意义的上下文语义。对于释义识别,余弦量度用作合并功能。这对于识别获得的源和可疑向量之间的语义相似性很有用。为了解决缺乏免费和公开阿拉伯语释义数据集的问题,将word2vec算法和词性标记结合在一起以生成可疑句子。为了验证 将其质量与SemEval基准进行比较。实验证明了我们建议方法的有效性。

更新日期:2021-03-10
down
wechat
bug