Incremental Word Vectors for Time-Evolving Sentiment Lexicon Induction

Bravo-Marquez, Felipe; Khanchandani, Arun; Pfahringer, Bernhard

doi:10.1007/s12559-021-09831-y

Incremental Word Vectors for Time-Evolving Sentiment Lexicon Induction

Published: 21 January 2021

Volume 14, pages 425–441, (2022)
Cite this article

Cognitive Computation Aims and scope Submit manuscript

Felipe Bravo-Marquez ORCID: orcid.org/0000-0002-2153-4306^1,2,
Arun Khanchandani³ &
Bernhard Pfahringer³

673 Accesses
12 Citations
3 Altmetric
Explore all metrics

Abstract

A sentiment lexicon is a list of expressions annotated according to affect categories such as positive, negative, anger and fear. Lexicons are widely used in sentiment classification of tweets, especially when labeled messages are scarce. Sentiment lexicons are prone to obsolescence due to: 1) the arrival of new sentiment-conveying expressions such as #trumpwall and #PrayForParis and 2) temporal changes in sentiment patterns of words (e.g., a scandal associated with an entity). In this paper, we propose a methodology for automatically inducing continuously updated sentiment lexicons from Twitter streams by training incremental word sentiment classifiers from time-evolving distributional word vectors. We experiment with various sketching techniques for efficiently building incremental word context matrices and study how the lexicon adapts to drastic changes in the sentiment pattern. Change is simulated by randomly picking some words from a testing partition of words and swapping their context with the context of words exhibiting the opposite sentiment. Our experimental results show that our approach allows for successfully tracking of the sentiment of words over time even when drastic change is induced.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

User Mood Tracking for Opinion Analysis on Twitter

Sentiment Analysis on Twitter through Topic-Based Lexicon Expansion

A Data-Driven Approach to Dynamically Learn Focused Lexicons for Recognizing Emotions in Social Network Streams

Notes

We have to keep in mind that we are assuming the training words do not change their sentiment over time.
http://corpus.byu.edu/coha/
http://conceptnet5.media.mit.edu/
http://sentiwordnet.isti.cnr.it/
http://sentic.net/
http://fastutil.di.unimi.it/
It is important to remark that the target word w is excluded from the context window \(c_1,\dots ,c_{2W}\). For example, for the sentence “I like my nice dog”, target word w = “my” and window size \(W = 2\), then the context words \(c_1,c_2,c_3,c_4\) (\(2W=4\)) would be “I”,“like”,“nice”,“dog”.
The method can return less than 2W words for out-of-range positions.
CMU TweetNLP - http://www.cs.cmu.edu/~ark/TweetNLP/
An additional reason to focus on adjectives is that they are the most important class of opinion words [64].
https://www.nltk.org/
https://dev.twitter.com/streaming/overview

References

Cambria E, Hussain A. Sentic Computing: A Common-Sense-Based Framework for Concept-Level Sentiment Analysis. Cham, Switzerland: Springer International Publishing; 2015.
Google Scholar
Cambria E, Poria S, Hazarika D, Kwok K. Senticnet 5: Discovering conceptual primitives for sentiment analysis by means of context embeddings. In S. A. McIlraith and K. Q. Weinberger, editors, Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, (AAAI-18), the 30th innovative Applications of Artificial Intelligence (IAAI-18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI-18). AAAI Press, New Orleans, Louisiana, USA. 2018:1795–1802. https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/view/16839.
Cambria E, Li Y, Xing FZ, Poria S, Kwok K. Senticnet 6: Ensemble application of symbolic and subsymbolic ai for sentiment analysis. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management, CIKM ’20. Association for Computing Machinery. New York, NY, USA. 2020:105–114. https://doi.org/10.1145/3340531.3412003.
Bifet A, Frank E. Sentiment knowledge discovery in twitter streaming data. In Proceedings of the 13th international conference on Discovery science. Springer-Verlag. 2010:1–15.
Susanto Y, Livingstone AG, Ng BC, Cambria E. The hourglass model revisited. IEEE Intell Syst. 2020;35(5):96–102. https://doi.org/10.1109/MIS.2020.2992799.
Article Google Scholar
Bravo-Marquez F, Frank E, Pfahringer B. From unlabelled tweets to twitter-specific opinion words. In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2015:743–746.
Tang D, Wei F, Qin B, Zhou M, Liu T. Building large-scale twitter-specific sentiment lexicon : A representation learning approach. In Proceedings of the 25th International Conference on Computational Linguisticss, Association for Computational Linguistics. 2014:172–182.
Turney PD, Pantel P. From frequency to meaning: Vector space models of semantics. Journal of Artificial Intelligence Research. 2010;37(1):141–88.
Article MathSciNet Google Scholar
Hamilton WL, Clark K, Leskovec J, Jurafsky D. Inducing domain-specific sentiment lexicons from unlabeled corpora. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. The Association for Computational Linguistics. 2016:595–605.
Harris ZS. Distributional structure. Word. 1954;10(2–3):146–62.
Google Scholar
Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J. Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems, Curran Associates, Inc., 2013:3111–3119.
Levy O, Goldberg Y. Neural word embedding as implicit matrix factorization. In Advances in neural information processing systems. 2014:2177–2185.
Levy O, Goldberg Y, Dagan I. Improving distributional similarity with lessons learned from word embeddings. Transactions of the Association for Computational Linguistics. 2015;3:211–25.
Article Google Scholar
Church KW, Hanks P. Word association norms, mutual information, and lexicography. Comput Linguist. 1990;16(1):22–9.
Google Scholar
Bifet A, Gavaldà R, Holmes G, Pfahringer B. Machine Learning for Data Streams: with Practical Examples in MOA. MIT Press; 2018.
Jenkins R. Hash functions. Dr Dobbs Journal. 1997;22(9):107–+.
Brown PF, Desouza PV, Mercer RL, Pietra VJD, Lai JC. Class-based n-gram models of natural language. Comput Linguist. 1992;18(4):467–79.
Google Scholar
Bottou L. Large-scale machine learning with stochastic gradient descent. In Y. Lechevallier and G. Saporta, editors. Proceedings of COMPSTAT’2010. Heidelberg, Physica-Verlag HD. 2010:177–186.
Bravo-Marquez F, Frank E, Pfahringer B. From opinion lexicons to sentiment classification of tweets and vice versa: A transfer learning approach. In 2016 IEEE/WIC/ACM International Conference on Web Intelligence, WI. 2016:145–152.
Go A, Bhayani R, Huang L. Twitter sentiment classification using distant supervision. CS224N Project Report, Stanford. 2009;1(12).
Bifet A, Holmes G, Kirkby R, Pfahringer B. Moa: Massive online analysis. J Mach Learn Res. 2010;11:1601–4.
Google Scholar
Bifet A, Holmes G, Pfahringer B, Gavalda R. Detecting sentiment change in twitter streaming data. In T. Diethe, J. Balcazar, J. Shawe-Taylor, and C. Tirnauca, editors, Proceedings of the Second Workshop on Applications of Pattern Analysis, volume 17 of Proceedings of Machine Learning Research, CIEM, Castro Urdiales, Spain, PMLR. 2011:5–11. http://proceedings.mlr.press/v17/bifet11a.html.
Hogenboom A, Bal D, Frasincar F, Bal M, de Jong F, Kaymak U. Exploiting emoticons in sentiment analysis. In Proceedings of the 28th Annual ACM Symposium on Applied Computing, SAC, New York, NY, USA, Association for Computing Machinery. 2013;-13:703–710. https://doi.org/10.1145/2480362.2480498.
Ibrahim NF, Wang X. Decoding the sentiment dynamics of online retailing customers: Time series analysis of social media. Computers in Human Behavior. 2019;96:32–45.
Article Google Scholar
Durant KT, Smith MD. The impact of time on the accuracy of sentiment classifiers created from a web log corpus. In Proceedings of the Twenty-Second AAAI Conference on Artificial Intelligence. AAAI Press. 2007:1340–1346.
Rubtsova Y. Reducing the deterioration of sentiment analysis results due to the time impact. Information. 2018;9(8). https://doi.org/10.3390/info9080184 https://www.mdpi.com/2078-2489/9/8/184.
Guimarães N, Torgo L, Figueira A. Twitter as a source for time-and domain-dependent sentiment lexicons. In Social Network Based Big Data Analysis and Applications. Springer. 2018:1–19.
Bravo-Marquez F, Frank E, Pfahringer B. Transferring sentiment knowledge between words and tweets. Web Intelligence. 2018;16(4):203–20.
Article Google Scholar
Kim Y, Chiu YI, Hanaki K, Hegde D, Petrov S. Temporal analysis of language through neural language models. In Proceedings of the ACL 2014 Workshop on Language Technologies and Computational Social Science, Baltimore, MD, USA, Association for Computational Linguistics. 2014:61–65. https://doi.org/10.3115/v1/W14-2517 https://www.aclweb.org/anthology/W14-2517.
Hamilton WL, Leskovec J, Jurafsky D. Diachronic word embeddings reveal statistical laws of semantic change. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. 2016:1489–1501.
Kutuzov A, Øvrelid L, Szymanski T, Velldal E. Diachronic word embeddings and semantic shifts: a survey. In Proceedings of the 27th International Conference on Computational Linguistics, Santa Fe, New Mexico, USA, Association for Computational Linguistics. 2018:1384–1397. https://www.aclweb.org/anthology/C18-1117.
Kulkarni V, Al-Rfou R, Perozzi B, Skiena S. Statistically significant detection of linguistic change. In Proceedings of the 24th International Conference on World Wide Web, WWW ’15, Republic and Canton of Geneva, Switzerland, International World Wide Web Conferences Steering Committee. 2015:625–635. https://doi.org/10.1145/2736277.2741627.
Kaji N, Kobayashi H. Incremental skip-gram model with negative sampling. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. 2017:363–371.
May C, Duh K, Van Durme B, Lall A. Streaming word embeddings with the space-saving algorithm. arXiv preprint. 2017:1704-07463
Rosenfeld A, Erk K. Deep neural models of semantic shift. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), New Orleans, Louisiana, Association for Computational Linguistics. 2018:474–484. https://doi.org/10.18653/v1/N18-1044 https://www.aclweb.org/anthology/N18-1044.
Heerschop B, van Iterson P, Hogenboom A, Frasincar F, Kaymak U. Analyzing sentiment in a large set of web data while accounting for negation. In E. Mugellini, P. S. Szczepaniak, M. C. Pettenati, and M. Sokhn, editors, Advances in Intelligent Web Mastering, Berlin, Heidelberg, Springer Berlin Heidelberg. 2011;3:195–205.
Wiegand M, Balahur A, Roth B, Klakow D, Montoyo A. A survey on the role of negation in sentiment analysis. In R. Morante and C. Sporleder, editors, Proceedings of the Workshop on Negation and Speculation in Natural Language Processing (NeSp-NLP 2010), Uppsala, Sweden, Stroudsburg, PA, 2019. Association for Computational Linguistics. 2010:60–68.
Ma Y, Peng H, Khan T, Cambria E, Hussain A. Sentic LSTM: a hybrid network for targeted aspect-based sentiment analysis. Cognitive Computation. 2018;10(4):639–50. https://doi.org/10.1007/s12559-018-9549-x.
Article Google Scholar
Marrese-Taylor E, Velásquez JD, Bravo-Marquez F. A novel deterministic approach for aspect-based opinion mining in tourism products reviews. Expert Systems with Applications. 2014;41(17):7764–75.
Article Google Scholar
Saeidi M, Bouchard G, Liakata M, Riedel S. Sentihood: Targeted aspect based sentiment analysis dataset for urban neighbourhoods. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers. 2016:1546–1556.
Denecke K. Using sentiwordnet for multilingual sentiment analysis. In 2008 IEEE 24th International Conference on Data Engineering Workshop. 2018:507–512. https://doi.org/10.1109/ICDEW.2008.4498370.
Hogenboom A, Heerschop B, Frasincar F, Kaymak U, de Jong F. Multi-lingual support for lexicon-based sentiment analysis guided by semantics. Decision support systems. 2014;62:43–53.
Article Google Scholar
Miller GA, Beckwith R, Fellbaum C, Gross D, Miller K. Wordnet: An on-line lexical database. International Journal of Lexicography. 1990;3:235–44.
Article Google Scholar
Esuli A, Sebastiani F. Determining the semantic orientation of terms through gloss classification. In Proceedings of the 14th ACM International Conference on Information and Knowledge Management. ACM. 2005:617–624.
Baccianella S, Esuli A, Sebastiani F. Sentiwordnet 3.0: An enhanced lexical resource for sentiment analysis and opinion mining. In Proceedings of the Seventh International Conference on Language Resources and Evaluation. European Language Resources Association, 2010:2200–2204.
Esuli A, Sebastiani F. Sentiwordnet: A publicly available lexical resource for opinion mining. In: In Proceedings of the 5th Conference on Language Resources and Evaluation, European Language Resources Association. 2006:417–422.
Heerschop B, Hogenboom A, Frasincar F. Sentiment lexicon creation from lexical resources. In International Conference on Business Information Systems. Springer. 2011:185–196.
Stewart I, Arendt D, Bell E, Volkova S. Measuring, predicting and visualizing short-term change in word representation and usage in vkontakte social network. In Proceedings of the Eleventh International Conference on Web and Social Media, ICWSM 2017, Montréal, Québec, Canada. 2017:672–675.
Jurafsky D, Martin JH. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. 2nd ed. Upper Saddle River, NJ, USA: Prentice Hall; 2008.
Google Scholar
Durme BV, Lall A. Streaming pointwise mutual information. In Advances in Neural Information Processing Systems. 2009:1892–1900.
Metwally A, Agrawal D, El Abbadi A. Efficient computation of frequent and top-k elements in data streams. In International Conference on Database Theory, Springer Berlin Heidelberg. 2005:398–412.
QasemiZadeh B, Kallmeyer L, Passban P. Sketching word vectors through hashing. CoRR, abs/1705.04253, 2017. http://arxiv.org/abs/1705.04253
Owoputi O, Connor B, Dyer C, Gimpel K, Schneider N, Smith NA. Improved part-of-speech tagging for online conversational text with word clusters. In Proceedings of the 2013 conference of the North American chapter of the association for computational linguistics: human language technologies. 2013:380–390.
Bravo-Marquez F, Frank E, Pfahringer B. Positive, negative, or neutral: Learning an expanded opinion lexicon from emoticon-annotated tweets. In Proceedings of the 24th International Conference on Artificial Intelligence, IJCAI’15. AAAI Press. 2015:1229–1235.
Bravo-Marquez F, Frank E, Pfahringer B. Building a twitter opinion lexicon from automatically-annotated tweets. Knowledge-Based Systems. 2016;108:65–78.
Article Google Scholar
Schlechtweg D, McGillivray B, Hengchen S, Dubossarsky H, Tahmasebi N. SemEval-2020 task 1: Unsupervised lexical semantic change detection. In Proceedings of the Fourteenth Workshop on Semantic Evaluation, Barcelona (online), International Committee for Computational Linguistics. 2020:1–23. https://www.aclweb.org/anthology/2020.semeval-1.1.
Petrović S, Osborne M, Lavrenko V. The edinburgh twitter corpus. In Proceedings of the NAACL HLT 2010 Workshop on Computational Linguistics in a World of Social Media, Stroudsburg, PA, USA, Association for Computational Linguistics. 2010:25–26.
Årup Nielsen F. A new ANEW: Evaluation of a word list for sentiment analysis in microblogs. In Proceedings of the 1st Workshop on Making Sense of Microposts (#MSM2011) 2011:93–98.
Cunha E, Magno G, Comarela G, Almeida V, Gonçalves MA, Benevenuto F. Analyzing the dynamic evolution of hashtags on twitter: a language-based approach. In Proceedings of the workshop on language in social media. LSM. 2011:58–65.
Badilla P, Bravo-Marquez F, Perez J. WEFE: The word embeddings fairness evaluation framework. In Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI-20, International Joint Conferences on Artificial Intelligence Organization. 2020:430–436. https://doi.org/10.24963/ijcai.2020/60.
Liu B. Sentiment analysis and opinion mining. Synthesis lectures on human language technologies. 2012;5(1):1–167.
Article Google Scholar
Kiritchenko S, Zhu X, Mohammad SM. Sentiment analysis of short informal texts. J Artif Intell Res. 2014;50:723–62.
Article Google Scholar
Bifet A, Gavalda R. Learning from time-changing data with adaptive windowing. In Proceedings of the 2007 SIAM international conference on data mining. SIAM. 2007:443–448.
Liu B. Sentiment Analysis and Opinion Mining. Synthesis Lectures on Human Language Technologies. Morgan & Claypool Publishers, 2012.

Download references

Acknowledgements

The authors would like to thank former Honors student Tristan Anderson for a preliminary study on incremental sentiment lexicons.

Funding

This work was funded by ANID FONDECYT grant 11200290, U-Inicia VID Project UI-004/20 and ANID - Millennium Science Initiative Program - Code ICN17_002.

Author information

Authors and Affiliations

Department of Computer Science, University of Chile, Santiago, Chile
Felipe Bravo-Marquez
Millennium Institute for Foundational Research on Data, IMFD-Chile, University of Chile, Santiago, Chile
Felipe Bravo-Marquez
Department of Computer Science, University of Waikato, Hamilton, New Zealand
Arun Khanchandani & Bernhard Pfahringer

Authors

Felipe Bravo-Marquez
View author publications
You can also search for this author in PubMed Google Scholar
Arun Khanchandani
View author publications
You can also search for this author in PubMed Google Scholar
Bernhard Pfahringer
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Felipe Bravo-Marquez.

Ethics declarations

Conflicts of Interest

The authors declare that they have no conflict of interest.

Ethical Approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Informed Consent

Informed consent was not required as no human or animals were involved.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bravo-Marquez, F., Khanchandani, A. & Pfahringer, B. Incremental Word Vectors for Time-Evolving Sentiment Lexicon Induction. Cogn Comput 14, 425–441 (2022). https://doi.org/10.1007/s12559-021-09831-y

Download citation

Received: 15 May 2020
Accepted: 12 January 2021
Published: 21 January 2021
Issue Date: January 2022
DOI: https://doi.org/10.1007/s12559-021-09831-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Incremental Word Vectors for Time-Evolving Sentiment Lexicon Induction

Abstract

Access this article

Similar content being viewed by others

User Mood Tracking for Opinion Analysis on Twitter

Sentiment Analysis on Twitter through Topic-Based Lexicon Expansion

A Data-Driven Approach to Dynamically Learn Focused Lexicons for Recognizing Emotions in Social Network Streams

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of Interest

Ethical Approval

Informed Consent

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Incremental Word Vectors for Time-Evolving Sentiment Lexicon Induction

Abstract

Access this article

Similar content being viewed by others

User Mood Tracking for Opinion Analysis on Twitter

Sentiment Analysis on Twitter through Topic-Based Lexicon Expansion

A Data-Driven Approach to Dynamically Learn Focused Lexicons for Recognizing Emotions in Social Network Streams

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of Interest

Ethical Approval

Informed Consent

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation