Classification of multi-lingual tweets, into multi-class model using Naïve Bayes and semi-supervised learning

Khan, Ayaz H.; Zubair, Muhammad

doi:10.1007/s11042-020-09512-2

Classification of multi-lingual tweets, into multi-class model using Naïve Bayes and semi-supervised learning

Published: 29 August 2020

Volume 79, pages 32749–32767, (2020)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

525 Accesses
8 Citations
1 Altmetric
Explore all metrics

Abstract

Twitter is a social media platform which has been proven to be a great tool for insights of emotions about products, policies etc. through a 280-character message called tweet, containing direct and unfiltered emotions by a large amount of user population. Twitter has attracted the attention of many researchers owing to the fact that every tweet is by default, public in nature which is not the case with Facebook. This paper proposes a model for multi-lingual (English and Roman Urdu) classification of tweets over diversely ranged classes (non-hierarchical architecture). Previous work in tweet classification is narrowly focused either on single language or either on uniform set of classes at most (Positive, Extremely Positive, Negative and Extremely Negative). The proposed model is based on semi-supervised learning and proposed feature selection approach makes it less dependent and highly adaptive for grabbing trending terms. This makes it a strong contender of choice for streaming data. In the methodology, using Naïve Bayes learning algorithm for each phase, obtained remarkable accuracy of up to 87.16% leading from both KNN and SVM models which are popular for NLP and Text classification domains.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1

A survey of sentiment analysis in social media

Article 04 July 2018

Lin Yue, Weitong Chen, … Minghao Yin

A comparison of text preprocessing techniques for hate and offensive speech detection in Twitter

Article 01 December 2023

Anna Glazkova

Machine learning-based social media bot detection: a comprehensive literature review

Article Open access 05 January 2023

Malak Aljabri, Rachid Zagrouba, … Dorieh M. Alomari

References

Bhavitha B, Rodrigues A, Chiplunkar N (2017) Comparative study of machine learning techniques in sentimental analysis. In: 2017 International conference on inventive communication and computational technologies (ICICCT), pp 216–221, https://doi.org/10.1109/ICICCT.2017.7975191
Bifet A, Frank E (2010) Sentiment knowledge discovery in twitter streaming data. In: Proceedings of the 13th international conference on discovery science, DS’10. http://dl.acm.org/citation.cfm?id=1927300.1927301. Springer, Berlin, pp 1–15
Bilal M, Israr H, Shahid M, Khan A (2016) Sentiment classification of roman-urdu opinions using naïve bayesian, decision tree and knn classification techniques. J King Saud Univ Comput Inform Sci 28 (3):330–344. https://doi.org/10.1016/j.jksuci.2015.11.003. http://www.sciencedirect.com/science/article/pii/S1319157815001330
Google Scholar
Deshwal A, Sharma SK (2016) Twitter sentiment analysis using various classification algorithms. In: 2016 5Th international conference on reliability, infocom technologies and optimization (trends and future directions) (ICRITO), pp 251–257, https://doi.org/10.1109/ICRITO.2016.7784960
Essam Kazem Al-Yasiri AAA (2019) Improving arabic sentiment analysis on social media: a comparative study on applying different pre-processing techniques. COMPUSOFT Int J Adv Comput Technol 8(6):3150–3157
Google Scholar
Go A, Bhayani R, Huang L (2009) Twitter sentiment classification using distant supervision. In: Processing. http://www.stanford.edu/alecmgo/papers/TwitterDistantSupervision09.pdf, pp 1–6
Gupta B, Negi M, Vishwakarma K, Rawat G, Badhani P (2017) Study of twitter sentiment analysis using machine learning algorithms on python. Int J Comput Appl 165:29–34. https://doi.org/10.5120/ijca2017914022
Article Google Scholar
Harshita Mandloi SM (2018) Sentiment analysis using parallel computing through gpu. international journal of scientific research in computer science. Eng Inform Technol (IJSRCSEIT) 3(6):428–434
Google Scholar
Hartmann T, Klenk S, Burkovski A, Heidemann G (2011) Sentiment detection with character n-grams. In: Stahlbock R (ed) Proceedings of the seventh international conference on data mining (DMIN’11)
Hasan M, Orgun MA, Schwitter R (2018) A survey on real-time event detection from the twitter data stream. J Inf Sci 44(4):443–463. https://doi.org/10.1177/0165551517698564
Article Google Scholar
Hong C, Yu J, Tao D, Wang M (2015) Image-based three-dimensional human pose recovery by multiview locality-sensitive sparse retrieval. IEEE Trans Ind Electron 62(6):3742–3751
Google Scholar
Hong C, Yu J, Wan J, Tao D, Wang M (2015) Multimodal deep autoencoder for human pose recovery. IEEE Trans Image Process 24(12):5659–5670
Article MathSciNet Google Scholar
Hong C, Yu J, Zhang J, Jin X, Lee K (2019) Multimodal face-pose estimation with multitask manifold deep learning. IEEE Trans Indust Inform 15 (7):3952–3961
Article Google Scholar
Keramatfar A, Amirkhani H (2018) Bibliometrics of sentiment analysis literature. J Inform Sci https://doi.org/10.1177/0165551518761013
Leskovec J, Rajaraman A, Ullman JD (2014) Mining of massive datasets, 2nd edn. Cambridge University Press, USA
Book Google Scholar
Lincy B, Nagarajan S (2019) A distributed support vector machine using apache spark for semi-supervised classification with data augmentation. In: Proceedings of ICSCSP 2018, vol 2, pp 395–405, https://doi.org/10.1007/978-981-13-3393-4_41
Liu YH, Chen YL (2018) A two-phase sentiment analysis approach for judgement prediction. J Inf Sci 44(5):594–607. https://doi.org/10.1177/0165551517722741
Article Google Scholar
Nirmal V, Amalarethinam G (2017) Real-time sentiment prediction on streaming social network data using in-memory processing. In: 2017 World congress on computing and communication technologies (WCCCT), pp 69–72, https://doi.org/10.1109/WCCCT.2016.26
Pandarachalil R, Selvaraju S, GS M (2014) Twitter sentiment analysis for large-scale data: an unsupervised approach. Cognit Comput 7:254–262. https://doi.org/10.1007/s12559-014-9310-z
Article Google Scholar
Parveen H, Pandey S (2016) Sentiment analysis on twitter data-set using naive bayes algorithm. In: 2016 2nd International conference on applied and theoretical computing and communication technology (iCATcct), pp 416–419, https://doi.org/10.1109/ICATCCT.2016.7912034
Rettig L, Khayati M, Cudre-Mauroux P, Piorkowski M (2015) Online anomaly detection over big data streams. In: Proceedings of the 2015 IEEE international conference on big data (Big Data), BIG DATA ’15. IEEE Computer Society, Washington, pp 1113–1122, https://doi.org/10.1109/BigData.2015.7363865 https://doi.org/10.1109/BigData.2015.7363865
Rodrigues A, Rao A, Chiplunkar N (2017) Sentiment analysis of real time twitter data using big data approach. In: 2017 2nd International conference on computational systems and information technology for sustainable solution (CSITSS), pp 1–6, https://doi.org/10.1109/CSITSS.2017.8447656
Singh R, Goel V (2019) Various machine learning algorithms for twitter sentiment analysis. In: Proceedings of third international conference on ICTCS 2017, pp 763–772, https://doi.org/10.1007/978-981-13-0586-3_74 https://doi.org/10.1007/978-981-13-0586-3_74
Thiruvathukal GK, Christensen C, Jin X, Tessier F, Vishwanath V (2019) A benchmarking study to evaluate apache spark on large-scale supercomputers. CoRR abs/1904.11812. arXiv:1904.11812
Yang Y, Shafiq M (2018) Large scale and parallel sentiment analysis based on label propagation in twitter data. In: 2018 17th IEEE international conference on trust, security and privacy in computing and communications/ 12th IEEE international conference on big data science and engineering (trustcom/bigdataSE), pp 1791–1798, https://doi.org/10.1109/TrustCom/BigDataSE.2018.00270
Youness M, Mohammed E, Jamaa B (2017) A parallel semantic sentiment analysis. In: 2017 3rd International conference of cloud computing technologies and applications (cloudtech), pp 1–6, https://doi.org/10.1109/CloudTech.2017.8284714
Yu J, Tan M, Zhang H, Tao D, Rui Y (2019) Hierarchical deep click feature prediction for fine-grained image recognition. IEEE Trans Pattern Anal Mach Intell 1–1
Yu J, Tao D, Wang M, Rui Y (2015) Learning to rank using user clicks and visual features for image retrieval. IEEE Trans Cybern 45(4):767–779
Article Google Scholar
Zvarevashe K, Olugbara O (2018) A framework for sentiment analysis with opinion mining of hotel reviews. In: 2018 Conference on information communications technology and society (ICTAS), pp 1–4, https://doi.org/10.1109/ICTAS.2018.8368746

Download references

Author information

Authors and Affiliations

Computer Science Department, Habib University, Karachi, Pakistan
Ayaz H. Khan
College of Computing and Information and Sciences, Karachi Institute of Economics and Technology, Karachi, Pakistan
Muhammad Zubair

Authors

Ayaz H. Khan
View author publications
You can also search for this author in PubMed Google Scholar
Muhammad Zubair
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ayaz H. Khan.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Khan, A.H., Zubair, M. Classification of multi-lingual tweets, into multi-class model using Naïve Bayes and semi-supervised learning. Multimed Tools Appl 79, 32749–32767 (2020). https://doi.org/10.1007/s11042-020-09512-2

Download citation

Received: 08 October 2019
Revised: 04 June 2020
Accepted: 31 July 2020
Published: 29 August 2020
Issue Date: November 2020
DOI: https://doi.org/10.1007/s11042-020-09512-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Classification of multi-lingual tweets, into multi-class model using Naïve Bayes and semi-supervised learning

Abstract

Access this article

Similar content being viewed by others

A survey of sentiment analysis in social media

A comparison of text preprocessing techniques for hate and offensive speech detection in Twitter

Machine learning-based social media bot detection: a comprehensive literature review

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Abstract

Access this article

Similar content being viewed by others

A survey of sentiment analysis in social media

A comparison of text preprocessing techniques for hate and offensive speech detection in Twitter

Machine learning-based social media bot detection: a comprehensive literature review

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation