The impact of preprocessing steps on the accuracy of machine learning algorithms in sentiment analysis,Computational and Mathematical Organization Theory

当前位置： X-MOL 学术 › Comput. Math. Organ. Theory › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

The impact of preprocessing steps on the accuracy of machine learning algorithms in sentiment analysis
Computational and Mathematical Organization Theory ( IF 1.8 ) Pub Date : 2018-03-16 , DOI: 10.1007/s10588-018-9266-8
Saqib Alam , Nianmin Yao

Big data and its related technologies have become active areas of research recently. There is a huge amount of data generated every minute and second that includes unstructured data which is the topic of interest for researchers now a days. A lot of research work is currently going on in the areas of text analytics and text preprocessing. In this paper, we have studied the impact of different preprocessing steps on the accuracy of three machine learning algorithms for sentiment analysis. We applied different text preprocessing techniques and studied their impact on accuracy for sentiment classification using three well-known machine learning classifiers including Naïve Bayes (NB), maximum entropy (MaxE), and support vector machines (SVM). We calculated accuracy of the three machine learning algorithms before and after applying the preprocessing steps. Results proved that the accuracy of NB algorithm was significantly improved after applying the preprocessing steps. Slight improvement in accuracy of SVM algorithm was seen after applying the preprocessing steps. Interestingly, in case of MaxE algorithm, no improvement in accuracy was seen. Our work is a comparative study, and our results proved that in case of NB algorithm, actuary was again significantly high than any other machine learning algorithm after applying the preprocessing steps; followed by MaxE and SVM algorithms. This research work proves that text preprocessing impacts the accuracy of machine learning algorithms. It further concludes that in case of NB algorithm, accuracy has significantly improved after applying text preprocessing steps.

中文翻译：

情感分析中预处理步骤对机器学习算法准确性的影响

大数据及其相关技术最近已成为研究的活跃领域。每分钟和每秒钟都会生成大量数据，其中包括非结构化数据，这是当今研究人员关注的话题。当前，在文本分析和文本预处理领域中正在进行许多研究工作。在本文中，我们研究了不同预处理步骤对三种用于情感分析的机器学习算法的准确性的影响。我们应用了不同的文本预处理技术，并使用三个著名的机器学习分类器，包括朴素贝叶斯（NB），最大熵（MaxE）和支持向量机（SVM），研究了它们对情感分类准确性的影响。在应用预处理步骤之前和之后，我们计算了三种机器学习算法的准确性。结果证明，应用预处理步骤后，NB算法的准确性得到了显着提高。在应用了预处理步骤后，可以看到SVM算法的精度略有提高。有趣的是，在使用MaxE算法的情况下，看不到准确性的提高。我们的工作是一项比较研究，我们的结果证明，在使用NB算法的情况下，精算比采用任何其他机器学习算法的应用预处理步骤高得多；其次是MaxE和SVM算法。这项研究工作证明了文本预处理会影响机器学习算法的准确性。进一步得出结论，在使用NB算法的情况下，

更新日期：2018-03-16

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11