Sarcasm detection in mash-up language using soft-attention based bi-directional LSTM and feature-rich CNN,Applied Soft Computing

当前位置： X-MOL 学术 › Appl. Soft Comput. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Sarcasm detection in mash-up language using soft-attention based bi-directional LSTM and feature-rich CNN
Applied Soft Computing ( IF 8.7 ) Pub Date : 2020-03-06 , DOI: 10.1016/j.asoc.2020.106198
Deepak Jain , Akshi Kumar , Geetanjali Garg

Analyzing explicit and clear sentiment is challenging owing to the growing use of emblematic and multilingual language constructs. This research proposes sarcasm detection using deep learning in code-switch tweets, specifically the mash-up of English with Indian native language, Hindi. The proposed model is a hybrid of bidirectional long short-term memory with a softmax attention layer and convolution neural network for real-time sarcasm detection. To evaluate the performance of the proposed model, real-time mash-up tweets are extracted on the trending political (#government) and entertainment (#cricket, #bollywood) posts on Twitter. The randomly sampled dataset contains 3000 sarcastic and 3000 non-sarcastic bilingual Hinglish (Hindi $+$ English) tweets. Feature engineering is done using pre-trained GloVe word embeddings to extract English semantic context vector, hand-crafted features using subjective lexicon Hindi-SentiWordNet to generate the SentiHindi feature vector and an auxiliary pragmatic feature vector depicting the count of pragmatic markers in tweet. Performance analysis is done to compare and validate the proposed $^{softAtt}$ BiLSTM- $^{feature-rich}$ CNN model. The model outperforms the baseline deep learning models with a superior classification accuracy of 92.71% and F-measure of 89.05%.

中文翻译：

使用基于软注意的双向LSTM和功能丰富的CNN以混搭语言进行讽刺检测

由于越来越多地使用象征性和多语言的语言结构，因此分析清晰和清晰的情感是一项挑战。这项研究提出了在代码转换推文中使用深度学习进行讽刺检测的方法，特别是将英语与印度母语（印地语）混搭。所提出的模型是双向长短期记忆与softmax注意层和卷积神经网络的混合，用于实时讽刺检测。为了评估所提出模型的性能，在Twitter上热门的政治（#government）和娱乐（＃cricket，＃bollywood）帖子中提取了实时混搭推文。随机采样的数据集包含3000个讽刺和3000个非讽刺的双语Hinglish（Hindi $+$ 英语）推文。使用预训练的GloVe词嵌入来提取英语语义上下文向量，使用主观词典Hindi-SentiWordNet手工生成特征来生成SentiHindi特征向量，以及用于描述tweet中的实用标记数量的辅助实用特征向量，从而完成特征工程。进行了性能分析以比较和验证建议的 $^{softAtt}$ BiLSTM- $^{功能丰富}$ CNN模型。该模型优于基线深度学习模型，具有92.71％的出色分类准确度和89.05％的F度量。

更新日期：2020-03-06

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>