当前位置: X-MOL 学术ACM Trans. Asian Low Resour. Lang. Inf. Process. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Sentiment Analysis Using XLM-R Transformer and Zero-shot Transfer Learning on Resource-poor Indian Language
ACM Transactions on Asian and Low-Resource Language Information Processing ( IF 2 ) Pub Date : 2021-06-30 , DOI: 10.1145/3461764
Akshi Kumar 1 , Victor Hugo C. Albuquerque 2
Affiliation  

Sentiment analysis on social media relies on comprehending the natural language and using a robust machine learning technique that learns multiple layers of representations or features of the data and produces state-of-the-art prediction results. The cultural miscellanies, geographically limited trending topic hash-tags, access to aboriginal language keyboards, and conversational comfort in native language compound the linguistic challenges of sentiment analysis. This research evaluates the performance of cross-lingual contextual word embeddings and zero-shot transfer learning in projecting predictions from resource-rich English to resource-poor Hindi language. The cross-lingual XLM-RoBERTa classification model is trained and fine-tuned using the English language Benchmark SemEval 2017 dataset Task 4 A and subsequently zero-shot transfer learning is used to evaluate the classification model on two Hindi sentence-level sentiment analysis datasets, namely, IITP-Movie and IITP-Product review datasets. The proposed model compares favorably to state-of-the-art approaches and gives an effective solution to sentence-level (tweet-level) analysis of sentiments in a resource-poor scenario. The proposed model compares favorably to state-of-the-art approaches and achieves an average performance accuracy of 60.93 on both the Hindi datasets.

中文翻译:

使用 XLM-R Transformer 和零样本迁移学习对资源贫乏的印度语言进行情感分析

社交媒体上的情绪分析依赖于理解自然语言并使用强大的机器学习技术来学习数据的多层表示或特征,并产生最先进的预测结果。文化杂项、受地域限制的热门话题标签、原住民语言键盘的使用以及母语会话的舒适性加剧了情感分析的语言挑战。本研究评估了跨语言上下文词嵌入和零样本迁移学习在预测从资源丰富的英语到资源贫乏的印地语时的性能。使用英语基准 SemEval 2017 数据集 Task 4 A 训练和微调跨语言 XLM-RoBERTa 分类模型,随后使用零样本迁移学习在两个印地语句子级情感分析数据集上评估分类模型,即 IITP-Movie 和 IITP-Product 评论数据集。所提出的模型与最先进的方法相比具有优势,并为资源匮乏场景中的句子级(推文级)情感分析提供了有效的解决方案。所提出的模型与最先进的方法相比具有优势,并在两个印地语数据集上实现了 60.93 的平均性能准确度。所提出的模型与最先进的方法相比具有优势,并为资源匮乏场景中的句子级(推文级)情感分析提供了有效的解决方案。所提出的模型与最先进的方法相比具有优势,并在两个印地语数据集上实现了 60.93 的平均性能准确度。所提出的模型与最先进的方法相比具有优势,并为资源匮乏场景中的句子级(推文级)情感分析提供了有效的解决方案。所提出的模型与最先进的方法相比具有优势,并在两个印地语数据集上实现了 60.93 的平均性能准确度。
更新日期:2021-06-30
down
wechat
bug