A Hybrid Approach of Machine Learning and Lexicons to Sentiment Analysis: Enhanced Insights from Twitter Data of Natural Disasters,Information Systems Frontiers

当前位置： X-MOL 学术 › Inf. Syst. Front. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A Hybrid Approach of Machine Learning and Lexicons to Sentiment Analysis: Enhanced Insights from Twitter Data of Natural Disasters
Information Systems Frontiers ( IF 6.9 ) Pub Date : 2021-02-14 , DOI: 10.1007/s10796-021-10107-x
Shalak Mendon , Pankaj Dutta , Abhishek Behl , Stefan Lessmann

The success factor of sentimental analysis lies in identifying the most occurring and relevant opinions among users relating to the particular topic. In this paper, we develop a framework to analyze users’ sentiments on Twitter on natural disasters using the data pre-processing techniques and a hybrid of machine learning, statistical modeling, and lexicon-based approach. We choose TF-IDF and K-means for sentiment classification among affinitive and hierarchical clustering. Latent Dirichlet Allocation, a pipeline of Doc2Vec and K-means used to capture themes, then perform multi-level polarity indices classification and its time series analysis. In our study, we draw insights from 243,746 tweets for Kerala’s 2018 natural disasters in India. The key findings of the study are the classification of sentiments based on similarity and polarity indices and identifying themes among the topics discussed on Twitter. We observe different sets of emotions and influencers, among others. Through this case example of Kerala floods, it shows how the government and other organizations could track the positive/negative sentiments concerning time and location; gain a better understanding of the topic of discussion trending among the public, and collaborate with crucial Twitter users/influencers to spread and figure out the gaps in the implementation of schemes in terms of design and execution. This research’s uniqueness is the streamlined and efficient combination of algorithms and techniques embedded in the framework used in achieving the above output, which can be integrated into a platform with GUI for further automation.

中文翻译：

机器学习和词汇词典的情感分析混合方法：来自Twitter的自然灾害数据的增强见解

情感分析的成功因素在于识别与特定主题相关的用户中最常出现的相关意见。在本文中，我们开发了一个框架，该框架使用数据预处理技术以及机器学习，统计建模和基于词典的方法的组合来分析Twitter在自然灾害上的用户情绪。我们选择TF-IDF和K-means进行情感聚类和层次聚类。Latent Dirichlet Allocation是Doc2Vec和K-means的管道，用于捕获主题，然后执行多级极性索引分类及其时间序列分析。在我们的研究中，我们从243,746条推文中得出了喀拉拉邦2018年印度自然灾害的见解。该研究的主要发现是基于相似性和极性指数对情感进行分类，并在Twitter上讨论的主题中确定主题。我们观察到各种不同的情绪和影响因素。通过喀拉拉邦洪水的案例，它显示了政府和其他组织如何跟踪有关时间和地点的正面/负面情绪；更好地了解公众中讨论的话题，并与重要的Twitter用户/影响者合作，以传播和找出在设计和执行方面计划实施方面的差距。这项研究的独特之处在于将算法和技术精简而有效地结合在框架中，以实现上述结果，

更新日期：2021-02-15

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11