Abstract
With rapid growth in user generated contents on the Web, various NLP research areas are emerging to utilize this information in ways that will facilitate users to manipulate the data efficiently. Opinion mining is one such area of research gaining interest among researchers to develop automated NLP systems that will be able to analyze sentiments expressed in natural languages. Being language and domain dependent task, the opinion mining systems require language specific resources for better results. Several studies on this theme have been presented using number of techniques, most of which focus mainly on English. The essential resources like corpus, lexicon, parsers, etc. are scarce for resource poor languages. In this paper, we present our experiments on construction of opinion corpus and sentiment lexicon that will be used for mining opinions from Marathi language text. The corpus is constructed using review documents from one of the popular opinion mining domains, i.e. movie reviews. Different experiments have been carried out to validate the resources. The lexicon based document level polarity classification system attained F-measure of 0.75 and 0.56 for positive and negative classes respectively. The results encourage us to continue the line of research with further attempts in resources and system improvements.
Similar content being viewed by others
References
Abdul-Mageed M, Diab MT (2012) Toward building a large-scale Arabic sentiment lexicon. In: Sixth international global wordnet conference, pp 18–22
Agarwal B, Soujanya P, Namita M, Alexander G, Amir H (2015) Concept-level sentiment analysis with dependency-based semantic parsing: a novel approach. Cogn Comput 7(4):487–499
Arora P, Akshat B, Vasudeva V (2012) Hindi subjective lexicon generation using WordNet graph traversal. Int J Comput Linguist Appl 3(1):25–39
Barnes J, Badia T, Lambert P (2018) MultiBooked: a corpus of basque and catalan hotel reviews annotated for aspect-level sentiment classification. In: Proceedings of the eleventh international conference on language resources and evaluation (LREC 2018)
Chakravarthi BR, Muralidaran V, Priyadharshini R, McCrae JP (2020) Corpus creation for sentiment analysis in code-mixed Tamil-English text. In: Proceedings of the 1st joint workshop on spoken language technologies for under-resourced languages (SLTU) and collaboration and computing for under-resourced languages (CCURL), pp 202–210
Daille B, Dubreil E, Monceaux L, Vernier M (2011) Annotating opinion—evaluation of blogs: the Blogoscopy Corpus. Lang Resour Eval 45(4):409–437
Das A, Bandyopadhyay S (2009) Subjectivity detection in English and Bengali: a CRF-based approach. In: Proceedings of ICON-2009: 7th international conference on natural language processing. Macmillan Publishers
Dehkharghani R (2019) SentiFars: a Persian polarity lexicon for sentiment analysis. ACM Trans Asian Low Resour Lang Inf Process (TALLIP) 19(2):1–12
Di Bari M, Sharoff S, Thomas M (2013) Sentiml: functional annotation for multilingual sentiment analysis. In: Proceedings of the 1st international workshop on collaborative annotations in shared environment: metadata, vocabularies and techniques in the digital humanities, pp 1–7
Elnagar A, Khalifa YS, Einea A (2018) Hotel Arabic-reviews dataset construction for sentiment analysis applications. In: Intelligent natural language processing: trends and applications. Springer, pp 35–52
Gangula RRR, Mamidi R (2018) Resource creation towards automated sentiment analysis in Telugu (a low resource language) and integrating multiple domain sources to enhance sentiment prediction. In: Proceedings of the eleventh international conference on language resources and evaluation (LREC 2018)
Guellil I, Adeel A, Azouaou F, Hussain A (2018) Sentialg: automated corpus annotation for algerian sentiment analysis. In: International conference on brain inspired cognitive systems. Springer, pp 557–567
Imane G, Faical A, Francisco C (2020) ArAutoSenti: automatic annotation and new tendencies for sentiment classification of Arabic messages. Soc Netw Anal Min 10(1):1–20
Jakob N, Gurevych I (2010) Extracting opinion targets in a single- and cross-domain setting with conditional random fields. In: Proceedings of the 2010 conference on empirical methods in natural language processing. Association for Computational Linguistics, pp 1035–1045
Joshi A, Balamurali AR, Bhattacharyya P (2010) A fall-back strategy for sentiment analysis in Hindi: a case study. In: Proceedings of ICON 2010: 8th international conference on natural language processing. Macmillan Publishers
Mukku SS, Mamidi R (2017) ACTSA: annotated corpus for Telegu sentiment analysis. In: Proceedings of the first workshop on building linguistically generalizable NLP systems, pp 54–58
Pang B, Lee L, Vaithyanathan S (2002) Thumbs up?: sentiment classification using machine learning techniques. In: Proceedings of the ACL-02 conference on empirical methods in natural language processing, vol 10. Association for Computational Linguistics, pp 79–86
Parkhe V, Biswas B (2016) Sentiment analysis of movie reviews: finding most important movie aspects using driving factors. Soft Comput 20(9):3373–9
Patil HB, Patil AS (2017) MarS: a rule-based stemmer for morphologically rich language Marathi. In: International conference on computer, communications and electronics (Comptelix). IEEE, pp 580–584
Patil N, Ajay P, Pawar BV (2020) Named entity recognition using conditional random fields. Proc Comput Sci 167:1181–1188
Reinel D, Scheidt J, Henrich A, Brucker N (2018) Sentiment phrase generation using statistical methods. In: Proceedings of the 33rd annual ACM symposium on applied computing, pp 452–460
Schuller B, Knaup T (2011) Learning and knowledge-based sentiment analysis in movie review key excerpts. In: Toward autonomous, adaptive, and context-aware multimodal interfaces. theoretical and practical issues. Springer, pp 448–472
Singh VK, Piryani R, Uddin A, Waila P (2013) Sentiment analysis of movie reviews a new feature-based heuristic for aspect-level sentiment classification. In: Proceedings of automation, computing, communication, control and compressed sensing (iMac4s), 2013 international multi-conference. IEEE, pp 712–717
Sujata R, Parteek K (2019) Deep learning based sentiment analysis using convolution neural network. Arabian J Sci Eng 44(4):3305–3314
Turney PD, Littman ML (2002) Unsupervised learning of semantic orientation from a hundred-billion-word corpus. Technical Report ERB-1094, National Research Council Canada, Institute for Information Technology
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Mhaske, N.T., Patil, A.S. Resource creation for opinion mining: a case study with Marathi movie reviews. Int. j. inf. tecnol. 13, 1521–1529 (2021). https://doi.org/10.1007/s41870-021-00698-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s41870-021-00698-8