Skip to main content
Log in

Classification of multi-lingual tweets, into multi-class model using Naïve Bayes and semi-supervised learning

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Twitter is a social media platform which has been proven to be a great tool for insights of emotions about products, policies etc. through a 280-character message called tweet, containing direct and unfiltered emotions by a large amount of user population. Twitter has attracted the attention of many researchers owing to the fact that every tweet is by default, public in nature which is not the case with Facebook. This paper proposes a model for multi-lingual (English and Roman Urdu) classification of tweets over diversely ranged classes (non-hierarchical architecture). Previous work in tweet classification is narrowly focused either on single language or either on uniform set of classes at most (Positive, Extremely Positive, Negative and Extremely Negative). The proposed model is based on semi-supervised learning and proposed feature selection approach makes it less dependent and highly adaptive for grabbing trending terms. This makes it a strong contender of choice for streaming data. In the methodology, using Naïve Bayes learning algorithm for each phase, obtained remarkable accuracy of up to 87.16% leading from both KNN and SVM models which are popular for NLP and Text classification domains.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Bhavitha B, Rodrigues A, Chiplunkar N (2017) Comparative study of machine learning techniques in sentimental analysis. In: 2017 International conference on inventive communication and computational technologies (ICICCT), pp 216–221, https://doi.org/10.1109/ICICCT.2017.7975191

  2. Bifet A, Frank E (2010) Sentiment knowledge discovery in twitter streaming data. In: Proceedings of the 13th international conference on discovery science, DS’10. http://dl.acm.org/citation.cfm?id=1927300.1927301. Springer, Berlin, pp 1–15

  3. Bilal M, Israr H, Shahid M, Khan A (2016) Sentiment classification of roman-urdu opinions using naïve bayesian, decision tree and knn classification techniques. J King Saud Univ Comput Inform Sci 28 (3):330–344. https://doi.org/10.1016/j.jksuci.2015.11.003. http://www.sciencedirect.com/science/article/pii/S1319157815001330

    Google Scholar 

  4. Deshwal A, Sharma SK (2016) Twitter sentiment analysis using various classification algorithms. In: 2016 5Th international conference on reliability, infocom technologies and optimization (trends and future directions) (ICRITO), pp 251–257, https://doi.org/10.1109/ICRITO.2016.7784960

  5. Essam Kazem Al-Yasiri AAA (2019) Improving arabic sentiment analysis on social media: a comparative study on applying different pre-processing techniques. COMPUSOFT Int J Adv Comput Technol 8(6):3150–3157

    Google Scholar 

  6. Go A, Bhayani R, Huang L (2009) Twitter sentiment classification using distant supervision. In: Processing. http://www.stanford.edu/alecmgo/papers/TwitterDistantSupervision09.pdf, pp 1–6

  7. Gupta B, Negi M, Vishwakarma K, Rawat G, Badhani P (2017) Study of twitter sentiment analysis using machine learning algorithms on python. Int J Comput Appl 165:29–34. https://doi.org/10.5120/ijca2017914022

    Article  Google Scholar 

  8. Harshita Mandloi SM (2018) Sentiment analysis using parallel computing through gpu. international journal of scientific research in computer science. Eng Inform Technol (IJSRCSEIT) 3(6):428–434

    Google Scholar 

  9. Hartmann T, Klenk S, Burkovski A, Heidemann G (2011) Sentiment detection with character n-grams. In: Stahlbock R (ed) Proceedings of the seventh international conference on data mining (DMIN’11)

  10. Hasan M, Orgun MA, Schwitter R (2018) A survey on real-time event detection from the twitter data stream. J Inf Sci 44(4):443–463. https://doi.org/10.1177/0165551517698564

    Article  Google Scholar 

  11. Hong C, Yu J, Tao D, Wang M (2015) Image-based three-dimensional human pose recovery by multiview locality-sensitive sparse retrieval. IEEE Trans Ind Electron 62(6):3742–3751

    Google Scholar 

  12. Hong C, Yu J, Wan J, Tao D, Wang M (2015) Multimodal deep autoencoder for human pose recovery. IEEE Trans Image Process 24(12):5659–5670

    Article  MathSciNet  Google Scholar 

  13. Hong C, Yu J, Zhang J, Jin X, Lee K (2019) Multimodal face-pose estimation with multitask manifold deep learning. IEEE Trans Indust Inform 15 (7):3952–3961

    Article  Google Scholar 

  14. Keramatfar A, Amirkhani H (2018) Bibliometrics of sentiment analysis literature. J Inform Sci https://doi.org/10.1177/0165551518761013

  15. Leskovec J, Rajaraman A, Ullman JD (2014) Mining of massive datasets, 2nd edn. Cambridge University Press, USA

    Book  Google Scholar 

  16. Lincy B, Nagarajan S (2019) A distributed support vector machine using apache spark for semi-supervised classification with data augmentation. In: Proceedings of ICSCSP 2018, vol 2, pp 395–405, https://doi.org/10.1007/978-981-13-3393-4_41

  17. Liu YH, Chen YL (2018) A two-phase sentiment analysis approach for judgement prediction. J Inf Sci 44(5):594–607. https://doi.org/10.1177/0165551517722741

    Article  Google Scholar 

  18. Nirmal V, Amalarethinam G (2017) Real-time sentiment prediction on streaming social network data using in-memory processing. In: 2017 World congress on computing and communication technologies (WCCCT), pp 69–72, https://doi.org/10.1109/WCCCT.2016.26

  19. Pandarachalil R, Selvaraju S, GS M (2014) Twitter sentiment analysis for large-scale data: an unsupervised approach. Cognit Comput 7:254–262. https://doi.org/10.1007/s12559-014-9310-z

    Article  Google Scholar 

  20. Parveen H, Pandey S (2016) Sentiment analysis on twitter data-set using naive bayes algorithm. In: 2016 2nd International conference on applied and theoretical computing and communication technology (iCATcct), pp 416–419, https://doi.org/10.1109/ICATCCT.2016.7912034

  21. Rettig L, Khayati M, Cudre-Mauroux P, Piorkowski M (2015) Online anomaly detection over big data streams. In: Proceedings of the 2015 IEEE international conference on big data (Big Data), BIG DATA ’15. IEEE Computer Society, Washington, pp 1113–1122, https://doi.org/10.1109/BigData.2015.7363865https://doi.org/10.1109/BigData.2015.7363865

  22. Rodrigues A, Rao A, Chiplunkar N (2017) Sentiment analysis of real time twitter data using big data approach. In: 2017 2nd International conference on computational systems and information technology for sustainable solution (CSITSS), pp 1–6, https://doi.org/10.1109/CSITSS.2017.8447656

  23. Singh R, Goel V (2019) Various machine learning algorithms for twitter sentiment analysis. In: Proceedings of third international conference on ICTCS 2017, pp 763–772, https://doi.org/10.1007/978-981-13-0586-3_74https://doi.org/10.1007/978-981-13-0586-3_74

  24. Thiruvathukal GK, Christensen C, Jin X, Tessier F, Vishwanath V (2019) A benchmarking study to evaluate apache spark on large-scale supercomputers. CoRR abs/1904.11812. arXiv:1904.11812

  25. Yang Y, Shafiq M (2018) Large scale and parallel sentiment analysis based on label propagation in twitter data. In: 2018 17th IEEE international conference on trust, security and privacy in computing and communications/ 12th IEEE international conference on big data science and engineering (trustcom/bigdataSE), pp 1791–1798, https://doi.org/10.1109/TrustCom/BigDataSE.2018.00270

  26. Youness M, Mohammed E, Jamaa B (2017) A parallel semantic sentiment analysis. In: 2017 3rd International conference of cloud computing technologies and applications (cloudtech), pp 1–6, https://doi.org/10.1109/CloudTech.2017.8284714

  27. Yu J, Tan M, Zhang H, Tao D, Rui Y (2019) Hierarchical deep click feature prediction for fine-grained image recognition. IEEE Trans Pattern Anal Mach Intell 1–1

  28. Yu J, Tao D, Wang M, Rui Y (2015) Learning to rank using user clicks and visual features for image retrieval. IEEE Trans Cybern 45(4):767–779

    Article  Google Scholar 

  29. Zvarevashe K, Olugbara O (2018) A framework for sentiment analysis with opinion mining of hotel reviews. In: 2018 Conference on information communications technology and society (ICTAS), pp 1–4, https://doi.org/10.1109/ICTAS.2018.8368746

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ayaz H. Khan.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Khan, A.H., Zubair, M. Classification of multi-lingual tweets, into multi-class model using Naïve Bayes and semi-supervised learning. Multimed Tools Appl 79, 32749–32767 (2020). https://doi.org/10.1007/s11042-020-09512-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-020-09512-2

Keywords

Navigation