Skip to main content
Log in

Resource creation for opinion mining: a case study with Marathi movie reviews

  • Original Research
  • Published:
International Journal of Information Technology Aims and scope Submit manuscript

Abstract

With rapid growth in user generated contents on the Web, various NLP research areas are emerging to utilize this information in ways that will facilitate users to manipulate the data efficiently. Opinion mining is one such area of research gaining interest among researchers to develop automated NLP systems that will be able to analyze sentiments expressed in natural languages. Being language and domain dependent task, the opinion mining systems require language specific resources for better results. Several studies on this theme have been presented using number of techniques, most of which focus mainly on English. The essential resources like corpus, lexicon, parsers, etc. are scarce for resource poor languages. In this paper, we present our experiments on construction of opinion corpus and sentiment lexicon that will be used for mining opinions from Marathi language text. The corpus is constructed using review documents from one of the popular opinion mining domains, i.e. movie reviews. Different experiments have been carried out to validate the resources. The lexicon based document level polarity classification system attained F-measure of 0.75 and 0.56 for positive and negative classes respectively. The results encourage us to continue the line of research with further attempts in resources and system improvements.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Notes

  1. http://www.cfilt.iitb.ac.in/Downloads.html.

References

  1. Abdul-Mageed M, Diab MT (2012) Toward building a large-scale Arabic sentiment lexicon. In: Sixth international global wordnet conference, pp 18–22

  2. Agarwal B, Soujanya P, Namita M, Alexander G, Amir H (2015) Concept-level sentiment analysis with dependency-based semantic parsing: a novel approach. Cogn Comput 7(4):487–499

    Article  Google Scholar 

  3. Arora P, Akshat B, Vasudeva V (2012) Hindi subjective lexicon generation using WordNet graph traversal. Int J Comput Linguist Appl 3(1):25–39

    Google Scholar 

  4. Barnes J, Badia T, Lambert P (2018) MultiBooked: a corpus of basque and catalan hotel reviews annotated for aspect-level sentiment classification. In: Proceedings of the eleventh international conference on language resources and evaluation (LREC 2018)

  5. Chakravarthi BR, Muralidaran V, Priyadharshini R, McCrae JP (2020) Corpus creation for sentiment analysis in code-mixed Tamil-English text. In: Proceedings of the 1st joint workshop on spoken language technologies for under-resourced languages (SLTU) and collaboration and computing for under-resourced languages (CCURL), pp 202–210

  6. Daille B, Dubreil E, Monceaux L, Vernier M (2011) Annotating opinion—evaluation of blogs: the Blogoscopy Corpus. Lang Resour Eval 45(4):409–437

    Article  Google Scholar 

  7. Das A, Bandyopadhyay S (2009) Subjectivity detection in English and Bengali: a CRF-based approach. In: Proceedings of ICON-2009: 7th international conference on natural language processing. Macmillan Publishers

  8. Dehkharghani R (2019) SentiFars: a Persian polarity lexicon for sentiment analysis. ACM Trans Asian Low Resour Lang Inf Process (TALLIP) 19(2):1–12

    Google Scholar 

  9. Di Bari M, Sharoff S, Thomas M (2013) Sentiml: functional annotation for multilingual sentiment analysis. In: Proceedings of the 1st international workshop on collaborative annotations in shared environment: metadata, vocabularies and techniques in the digital humanities, pp 1–7

  10. Elnagar A, Khalifa YS, Einea A (2018) Hotel Arabic-reviews dataset construction for sentiment analysis applications. In: Intelligent natural language processing: trends and applications. Springer, pp 35–52

  11. Gangula RRR, Mamidi R (2018) Resource creation towards automated sentiment analysis in Telugu (a low resource language) and integrating multiple domain sources to enhance sentiment prediction. In: Proceedings of the eleventh international conference on language resources and evaluation (LREC 2018)

  12. Guellil I, Adeel A, Azouaou F, Hussain A (2018) Sentialg: automated corpus annotation for algerian sentiment analysis. In: International conference on brain inspired cognitive systems. Springer, pp 557–567

  13. Imane G, Faical A, Francisco C (2020) ArAutoSenti: automatic annotation and new tendencies for sentiment classification of Arabic messages. Soc Netw Anal Min 10(1):1–20

    Article  Google Scholar 

  14. Jakob N, Gurevych I (2010) Extracting opinion targets in a single- and cross-domain setting with conditional random fields. In: Proceedings of the 2010 conference on empirical methods in natural language processing. Association for Computational Linguistics, pp 1035–1045

  15. Joshi A, Balamurali AR, Bhattacharyya P (2010) A fall-back strategy for sentiment analysis in Hindi: a case study. In: Proceedings of ICON 2010: 8th international conference on natural language processing. Macmillan Publishers

  16. Mukku SS, Mamidi R (2017) ACTSA: annotated corpus for Telegu sentiment analysis. In: Proceedings of the first workshop on building linguistically generalizable NLP systems, pp 54–58

  17. Pang B, Lee L, Vaithyanathan S (2002) Thumbs up?: sentiment classification using machine learning techniques. In: Proceedings of the ACL-02 conference on empirical methods in natural language processing, vol 10. Association for Computational Linguistics, pp 79–86

  18. Parkhe V, Biswas B (2016) Sentiment analysis of movie reviews: finding most important movie aspects using driving factors. Soft Comput 20(9):3373–9

    Article  Google Scholar 

  19. Patil HB, Patil AS (2017) MarS: a rule-based stemmer for morphologically rich language Marathi. In: International conference on computer, communications and electronics (Comptelix). IEEE, pp 580–584

  20. Patil N, Ajay P, Pawar BV (2020) Named entity recognition using conditional random fields. Proc Comput Sci 167:1181–1188

    Article  Google Scholar 

  21. Reinel D, Scheidt J, Henrich A, Brucker N (2018) Sentiment phrase generation using statistical methods. In: Proceedings of the 33rd annual ACM symposium on applied computing, pp 452–460

  22. Schuller B, Knaup T (2011) Learning and knowledge-based sentiment analysis in movie review key excerpts. In: Toward autonomous, adaptive, and context-aware multimodal interfaces. theoretical and practical issues. Springer, pp 448–472

  23. Singh VK, Piryani R, Uddin A, Waila P (2013) Sentiment analysis of movie reviews a new feature-based heuristic for aspect-level sentiment classification. In: Proceedings of automation, computing, communication, control and compressed sensing (iMac4s), 2013 international multi-conference. IEEE, pp 712–717

  24. Sujata R, Parteek K (2019) Deep learning based sentiment analysis using convolution neural network. Arabian J Sci Eng 44(4):3305–3314

    Article  Google Scholar 

  25. Turney PD, Littman ML (2002) Unsupervised learning of semantic orientation from a hundred-billion-word corpus. Technical Report ERB-1094, National Research Council Canada, Institute for Information Technology

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to N. T. Mhaske.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mhaske, N.T., Patil, A.S. Resource creation for opinion mining: a case study with Marathi movie reviews. Int. j. inf. tecnol. 13, 1521–1529 (2021). https://doi.org/10.1007/s41870-021-00698-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s41870-021-00698-8

Keywords

Navigation