当前位置: X-MOL 学术Pattern Recogn. Lett. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
TANA: The amalgam neural architecture for sarcasm detection in indian indigenous language combining LSTM and SVM with word-emoji embeddings
Pattern Recognition Letters ( IF 3.9 ) Pub Date : 2022-05-27 , DOI: 10.1016/j.patrec.2022.05.026
Deepak Kumar Jain , Akshi Kumar , Saurabh Raj Sangwan

Sentiment analysis is indeed a difficult task owing to the playful language mannerism, altered vocabulary and speak-text used on online forums. Humans tend to use words and phrases in ways that are incomprehensible to those who are not involved in the discourse. Sarcastic remarks in conversations are often utilized to mock others by saying something that isn't pleasant. Sardonic or humorous statements/ tones are used to insult or make others appear puerile. Automated sarcasm detection is considered as one of the key tasks to tweak sentiment analysis and extending it to a morphologically rich and free-order dominant indigenous Indian language Hindi is another challenge. This research puts forward ‘The Amalgam Neural Architecture’, TANA, to detect sarcasm in Hindi tweets. The architecture is trained using two embeddings, namely word and emoji embeddings and combines an LSTM with the loss function of SVM for sarcasm detection. We use the Sarc-H dataset, which is built by scrapping Hindi language tweets and manually annotating based on the hashtags ‘

’ (pronounced as kataaksh, which means sarcasm in Hindi) and ‘
’ (pronounced as vyangya, another word for sarcasm in Hindi) used by the tweeters and the results are evaluated using various classification performance metrics and achieves a F-score of 0.9675 outperforming LSTM using last layer as softmax as well as the existing works.



中文翻译:

TANA:用于印度土著语言中讽刺检测的汞合金神经架构,将 LSTM 和 SVM 与单词表情符号嵌入相结合

由于在线论坛上使用有趣的语言习惯、改变的词汇和说话文本,情感分析确实是一项艰巨的任务。人类倾向于以不参与讨论的人无法理解的方式使用单词和短语。谈话中的讽刺性言论经常被用来通过说一些不愉快的话来嘲笑别人。讽刺或幽默的陈述/语气用于侮辱或使他人显得幼稚。自动讽刺检测被认为是调整情感分析的关键任务之一,并将其​​扩展到形态丰富且自由顺序的占主导地位的印度土著语言印地语是另一个挑战。本研究提出了' T he A malgam N eural Architecture',TANA,用于检测印地语推文中的讽刺。该架构使用两个嵌入进行训练,即单词和表情符号嵌入,并将 LSTM 与 SVM 的损失函数相结合以进行讽刺检测。我们使用 Sarc-H 数据集,该数据集是通过删除印地语推文并根据主题标签手动注释而构建的

'(发音为 kataaksh,在印地语中意为讽刺)和 '
'(发音为 vyangya,印地语中讽刺的另一个词)被高音扬声器使用,结果使用各种分类性能指标进行评估,使用最后一层作为 softmax 以及现有作品,获得了 0.9675 的 F 分数,优于 LSTM。

更新日期:2022-05-27
down
wechat
bug