Multichannel CNN with Attention for Text Classification,arXiv - CS - Computation and Language

当前位置： X-MOL 学术 › arXiv.cs.CL › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Multichannel CNN with Attention for Text Classification
arXiv - CS - Computation and Language Pub Date : 2020-06-29 , DOI: arxiv-2006.16174
Zhenyu Liu, Haiwei Huang, Chaohong Lu, Shengfei Lyu

Recent years, the approaches based on neural networks have shown remarkable potential for sentence modeling. There are two main neural network structures: recurrent neural network (RNN) and convolution neural network (CNN). RNN can capture long term dependencies and store the semantics of the previous information in a fixed-sized vector. However, RNN is a biased model and its ability to extract global semantics is restricted by the fixed-sized vector. Alternatively, CNN is able to capture n-gram features of texts by utilizing convolutional filters. But the width of convolutional filters restricts its performance. In order to combine the strengths of the two kinds of networks and alleviate their shortcomings, this paper proposes Attention-based Multichannel Convolutional Neural Network (AMCNN) for text classification. AMCNN utilizes a bi-directional long short-term memory to encode the history and future information of words into high dimensional representations, so that the information of both the front and back of the sentence can be fully expressed. Then the scalar attention and vectorial attention are applied to obtain multichannel representations. The scalar attention can calculate the word-level importance and the vectorial attention can calculate the feature-level importance. In the classification task, AMCNN uses a CNN structure to cpture word relations on the representations generated by the scalar and vectorial attention mechanism instead of calculating the weighted sums. It can effectively extract the n-gram features of the text. The experimental results on the benchmark datasets demonstrate that AMCNN achieves better performance than state-of-the-art methods. In addition, the visualization results verify the semantic richness of multichannel representations.

中文翻译：

具有注意力的多通道 CNN 用于文本分类

近年来，基于神经网络的方法在句子建模方面显示出巨大的潜力。有两种主要的神经网络结构：循环神经网络（RNN）和卷积神经网络（CNN）。RNN 可以捕获长期依赖关系并将先前信息的语义存储在固定大小的向量中。然而，RNN 是一个有偏差的模型，它提取全局语义的能力受到固定大小向量的限制。或者，CNN 能够通过利用卷积滤波器来捕获文本的 n-gram 特征。但是卷积滤波器的宽度限制了它的性能。为了结合两种网络的优点并弥补其缺点，本文提出了基于注意力的多通道卷积神经网络（AMCNN）用于文本分类。AMCNN利用双向长短期记忆，将单词的历史和未来信息编码成高维表示，使句子前后信息都能得到充分表达。然后应用标量注意力和向量注意力来获得多通道表示。标量注意力可以计算词级重要性，向量注意力可以计算特征级重要性。在分类任务中，AMCNN 使用 CNN 结构在标量和向量注意机制生成的表示上捕获词关系，而不是计算加权和。它可以有效地提取文本的n-gram特征。在基准数据集上的实验结果表明，AMCNN 比最先进的方法获得了更好的性能。此外，可视化结果验证了多通道表示的语义丰富性。

更新日期：2020-06-30

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>