Sentiment analysis with genetic programming
Introduction
Sentiment analysis (SA) is the research field that aims to classify the emotions of a particular text, usually as polarities, i.e., positive, negative, or neutral [1]. Research in this field has gained prominence since the emergence of online social network platforms, such as Twitter1 and Facebook2. These social tools allow their users to share text, usually short opinion text, with their social contacts. Soon after they were developed, online social networks became available via smart phones, which contributed even more to their popularity among all socioeconomic classes. The consequence of the popularity of social networks is that the text posted on social networks has become a very important source of information about what people think and feel about anything, including trademarks, products, celebrities, and political subjects. Thus, there is great interest from industry, e-commerce, celebrities, and political agents, among other entities, in applying SA to these texts. This demand has motivated research in SA [2], [3], [4].
SA is performed by sentiment classifiers, which are text classifiers developed to identify the polarity of pieces of text according to the prevalent sentiment they transmit. Sentiment classifiers are often generated manually by specialists in the specific domain of the classification, thus capturing the experience of the designer in the analyzed context [5], [6], [7], [8]. However, the manual approach increases the model generation cost for each scenario, penalizes generalization and does not scale to large collections of messages, such as those derived from online social networks [9].
Given the massive amount of text to be classified and the cost involved in manually generating a classifier, some form of automatic classifier generation is mandatory. Automatic sentiment classification approaches are commonly divided into three classes [7], [10], [11]: (i) lexicon-based techniques, (ii) machine learning (ML)-based techniques, and (iii) hybrid methods.
A lexicon-based strategy is a nonsupervised approach that consists of going through the text word by word and consulting the polarity (and perhaps the polarity intensity) of each term in a lexicon. A combining function (such as a sum or average) is applied to make the final prediction regarding the overall sentiment of the text. This approach is easy to implement, and its accuracy greatly depends on the quality of the lexicon used [12].
Machine learning strategies apply learning algorithms (SVM, logistic regression, etc.) that can derive a classifier automatically from a set of texts with polarities that are usually manually labeled (the training set). Hybrid methods combine the first two approaches.
Unfortunately, neither of the first two approaches is able to perform polarity classification accurately in most cases. For instance, lexicon-based approaches alone are not able to deal with cases in which none of the words in the text occur in the lexicon as well as other situations [13]. On the other hand, machine learning methods alone are not able to capture the polarity of individual words as well as lexicon approaches. Thus, hybrid approaches have attracted research attention as an alternative solution for SA.
Despite the many advances in the area, SA continues to be a difficult text classification problem. Sentiment analysis suffers from the same difficulties found in other text classification tasks (such as topic classification), which include the curse of dimensionality, vocabulary sparsity, and category imbalance, among others. In addition, SA detection has its own idiosyncrasies, which further complicate matters. Texts appearing in polarity detection applications are mostly short opinion texts, such as tweets and product reviews, and contain many typos, slang, emoticons, emojis, and rhetorical figures, such as sarcasm and irony. As a consequence, most preprocessing that is usually applied to the words in other text classification tasks (stopword filtering, feature selection, normalization, etc.) is usually not sufficient for SA. Thus, handcrafted features (part-of-speech tagging, features informing the presence of negation, uppercase words, the polarity of emoticons or emojis, etc.) are often applied to enhance text representation before a model is trained.
In addition, there are many possibilities for combining different types of features (text-directed-derived features such as tf-idf3 and handcrafted features). Additionally, there are many lexicons available in SA software tools, and some have been proposed in the literature; some of them, as we show, are more useful than others. Deciding which of these resources should be used and how to combine them for a given context is decisive for the success of SA. In Section 5.4 of this paper, we present the examples of Task 9 of SemEval 2014 and Task 4 Subtask A of the SemEval 2016 competitions to show how combinations of the abovementioned resources make the difference for classifier generation. In our experiments in which we apply the same features for different learners, we see that classifier trained using a SVM, a state-of-the-art learning algorithm, exceeds that generated via logistic regression (LR) in four out of six test datasets. However, the SemEval winning solution adopted LR as the learning algorithm. In fact, the competitors were allowed to use any combination of features, handcraft their own features, and apply any lexicon. The combination of resources was decisive for the winning solution as it compensated for the weakness of LR and allowed the final model to outperform SVM-based models in the competition. Nevertheless, combining all these resources (many features and many lexicons) is time-consuming manual work since there are many features and lexicons and the number of all possible combinations of them is exponential.
We show in this paper that genetic programming (GP) [14] automatically solves this resource combination problem as an optimization approach to evolve the best resource combinations. Once one defines functions to deal with each kind of feature/resource, GP finds approximations of the optimal combinations of these functions. Even preprocessing decisions, such as punctuation removal or stopword filtering, can be left for GP to decide whether they should be applied or not. Thus, enormous manual/mental effort savings are garnered to generate a sentiment classifier.
We propose functions to implement a large number of preprocessing tasks usually applied in SA. These functions are devised in a way that their results can be used in the processing of other functions, allowing the creation of a hierarchical structure (tree) of functions (also known as program or individual in GP jargon). These trees are manipulated by genetic programming via genetic operators (crossover, replication, and mutation) to evolve new trees. In particular, we propose a function that is able to obtain a weighted average of the polarities of the words in a text. The weight (importance) of each lexicon is taken into consideration in this function. These weights are obtained automatically by GP, which allows us not only to compute the polarities of words in the text better but also to identify which lexicons are more important to a given training dataset. In our case, this weighting scheme showed that only a few of the lexicons used are truly relevant and that they seem to complement each other.
In addition to combining many resources, GP shows itself to be a promising sentiment classification tool. We compared GP with the three machine learning algorithms used by the three best-ranked competitors in Task 9 of the SemEval 2014 Competition. These algorithms were logistic regression (LR) [15], used by the winner; support vector machines (SVM) [16], used by the first and second runners-up; and stochastic gradient descent (SGD) [17], used by the fourth-placed competitor. We also compared the results with random forest (RF) [18] and naive Bayes (NB) [19].4 The GP results are comparable to those of the SVMs, but it surpasses surpassed the SVMs considerably in the most challenging dataset in the competition. However, GP performed better than SGD in two of the datasets and better than LR, RF, and NB in many datasets. We also present a comparison between our GP results and the three best-ranked solutions submitted to Subtask A “Message Polarity Classification” of the SemEval 2016 competition [20] for each test dataset. According to our experiments, the GP results are close to the third-ranked solutions, except in the Sarcasm dataset. Our opponents used sophisticated resources, such as classifier ensembles, deep neural networks, and extra datasets, in addition to the competition benchmark used for training, with more handmade features.
Finally, another advantage of GP as a hybrid approach is that the final classifier derived (i.e., the last individual) is highly interpretable. This individual corresponds to a tree in which the internal nodes are functions and the leaves are input features. Thus, it is possible to identify the set of functions most used in the last generation, which is useful not only to know the best features but also to know how they are combined in most of the best individuals (trees). This is unlike other traditional methods, such as artificial neural networks, which are known to generate noninterpretable models.
In summary, the main contributions of this work are as follows:
a) We show, using our experiments and the results for Task 9 of the SemEval 2014 and Task 4 Subtask A of the SemEval 2016 competitions, the importance of a good choice of resources (features and lexicons) and of a good combination of them to derive a competitive hybrid SA solution.
b) We propose a new hybrid approach using GP that can automatically combine preprocessing tasks and resources to derive an effective sentiment classifier. This is a very important contribution in our opinion since the choice and combination of text preprocessing approaches, features and lexicons are usually made manually. This is a time-consuming and error-prone task when performed manually.
c) We show examples of functions that can be used to compose individuals for GP. This set of functions can be extended to support other preprocessing and handcrafted features, which makes GP an extensible tool.
d) We provide a discussion of the interpretability of the final solution created by GP.
e) We provide a comparative study of GP and other sentiment classifiers that use different machine learning methods. The results show the promising applicability of GP to SA.
The rest of this work is organized as follows. Initially, the related work is presented in Section 2. Section 3 presents an overview of GP. Section 4 shows our proposal to adapt the GP method to the SA problem. The experiments are described in Section 5, and the conclusions are presented in Section 6.
Section snippets
Related works
We briefly describe the state of the art of SA with a hybrid strategy in Section 2.1 and the use of evolutionary computing in SA from the literature in Section 2.2.
Genetic programming
GP is an evolutionary technique that aims to obtain a good population of computer programs (or solutions) to solve a given problem. GP starts with an initial population of programs, usually generated randomly. In each population, individuals satisfying a fitness criterion (i.e., best individuals) are chosen to form the next population. Genetic operations such as crossover, mutation, and replication are applied to the best individual of a population to generate the population of the next
Genetic programming applied to sentiment analysis
In this section, we present three of the preparatory steps used to adapt GP for the sentiment analysis problem. Specifically, we present the terminal set, the function set, and the fitness function we propose to use with GP. The GP control parameters are discussed in the experimental setup (Section 5.2) since some of them are defined experimentally. Finally, as the last component of the preparatory step, the number of generations is the criterion adopted to interrupt the evolutive processing of
Experimental methodology
In this section, we present experiments comparing GP adapted to SA with classical techniques of machine learning: support vector machines (SVM) [16], [68] (using a linear kernel), naive Bayes (NB) [19], logistic regression (LR) [15], random forest (RF) [18] and stochastic gradient descent (SGD) [17]. Models using these techniques were generated with the support of the Scikit-learn library.17
We also compare the GP results to the best solutions in SemEval 2014 and SemEval
Discussion
The results of this study indicate that the sentiment analysis classifier created with the GP is competitive compared to works that use the same benchmark. In regard to the LiveJournal dataset, which obtained the best values, the results revealed an F1-P/N (average F1-score considering the positive and negative classes) of 71.24, only 3.6 points inferior to the best method reported in [30] in SemEval 2014 and only 2.87 points inferior to the best method reported in [83] in SemEval 2016. In
Conclusion
The use of genetic programming applied to sentiment analysis has been underexplored in the literature. In this paper, the goal of generating a hybrid classifier for sentiment analysis using genetic programming and lexical combination has been achieved.
Among the main contributions of this research, we showed the importance of a good choice of resources (features and lexicons) and of a good combination of them to derive a competitive hybrid SA solution. We proposed a new hybrid approach using GP
CRediT authorship contribution statement
Airton Bordin Junior: Conceptualization, Methodology, Software, Writing - original draft. Nádia Félix F. da Silva: Data curation, Writing - original draft, Software. Thierson Couto Rosa: Visualization, Investigation, Writing - review & editing. Celso G.C. Junior: Writing - review & editing.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgments
The authors would like to acknowledge the Brazilian Research Agency CNPq for their financial support.
References (90)
- et al.
E2sam: evolutionary ensemble of sentiment analysis methods for domain adaptation
Inf. Sci.
(2019) - et al.
An effective approach to track levels of influenza-a (h1n1) pandemic in india using twitter
Proc. Comput. Sci.
(2015) - et al.
Alga: adaptive lexicon learning using genetic algorithm for sentiment analysis of microblogs
Knowl.-Based Syst.
(2017) - et al.
Opinion mining of movie review using hybrid method of support vector machine and particle swarm optimization
Proc. Eng.
(2013) - et al.
Sentiment analysis: a combined approach
J. Inf.
(2009) Sentiment analysis: a multifaceted problem
IEEE Intell. Syst.
(2010)- et al.
Using frame-based resources for sentiment analysis within the financial domain
Prog. AI
(2018) - et al.
Review-aggregated aspect-based sentiment analysis with ontology features
Prog. AI
(2018) - L. Becker, G. Erhart, D. Skiba, V. Matula, Avaya: sentiment analysis on twitter with self-training and polarity lexicon...
- H. Kanayama, T. Nasukawa, Fully automatic lexicon expansion for domain-oriented sentiment analysis, in: Proceedings of...
A comparative study of sentiment analysis techniques
J. JIKRCE
Support-vector networks
Mach. Learn.
Naive bayes classifiers
University of British Columbia
Cited by (6)
A sentiment-enhanced hybrid model for crude oil price forecasting
2023, Expert Systems with ApplicationsCitation Excerpt :Furthermore, with the expanding number of news resources, it is becoming increasingly important to combine multiple resources (Emerson & Declerck, 2014). Junior, da Silva, Rosa, and Junior (2021) adopted the weighting scheme to combine resources to derive an effective sentiment analysis. In this section, we briefly describe the new methodology used to forecast the market price and introduce several performance-evaluation criteria.
Data Mining for public channels and groups in Telegram messenger
2023, Proceedings of SPIE - The International Society for Optical EngineeringCharacterization of the Mobile User Profile Based on Sentiments and Network Usage Attributes
2022, Journal of Internet Services and ApplicationsFine-Tuning BERT Based Approach for Multi-Class Sentiment Analysis on Twitter Emotion Data
2022, Ingenierie des Systemes d'InformationA CONCEPTUAL MODEL FOR DECISION SUPPORT SYSTEMS USING ASPECT BASED SENTIMENT ANALYSIS
2021, Proceedings of the Romanian Academy Series A - Mathematics Physics Technical Sciences Information Science