Resource creation for opinion mining: a case study with Marathi movie reviews,International Journal of Information Technology

当前位置： X-MOL 学术 › Int. J. Inf. Technol. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Resource creation for opinion mining: a case study with Marathi movie reviews
International Journal of Information Technology Pub Date : 2021-05-26 , DOI: 10.1007/s41870-021-00698-8
N. T. Mhaske , A. S. Patil

With rapid growth in user generated contents on the Web, various NLP research areas are emerging to utilize this information in ways that will facilitate users to manipulate the data efficiently. Opinion mining is one such area of research gaining interest among researchers to develop automated NLP systems that will be able to analyze sentiments expressed in natural languages. Being language and domain dependent task, the opinion mining systems require language specific resources for better results. Several studies on this theme have been presented using number of techniques, most of which focus mainly on English. The essential resources like corpus, lexicon, parsers, etc. are scarce for resource poor languages. In this paper, we present our experiments on construction of opinion corpus and sentiment lexicon that will be used for mining opinions from Marathi language text. The corpus is constructed using review documents from one of the popular opinion mining domains, i.e. movie reviews. Different experiments have been carried out to validate the resources. The lexicon based document level polarity classification system attained F-measure of 0.75 and 0.56 for positive and negative classes respectively. The results encourage us to continue the line of research with further attempts in resources and system improvements.

中文翻译：

创建资源以进行观点挖掘：以Marathi电影评论为例的案例研究

随着用户在Web上生成的内容的迅速增长，各种NLP研究领域正在兴起，以方便用户有效地处理数据的方式来利用此信息。意见挖掘是此类研究领域之一，在研究人员中引起了兴趣，他们需要开发能够分析自然语言表达的情感的自动NLP系统。作为与语言和领域相关的任务，意见挖掘系统需要特定于语言的资源才能获得更好的结果。已经使用多种技术对这一主题进行了一些研究，其中大多数技术主要集中在英语上。对于资源贫乏的语言来说，诸如语料库，词典，解析器等基本资源是稀缺的。在本文中，我们介绍了构建意见语料库和情感词典的实验，这些实验将用于从马拉地语文本中挖掘意见。语料库是使用来自一种流行观点挖掘领域（即电影评论）的评论文档构建的。已经进行了不同的实验来验证资源。基于词典的文档级别极性分类系统分别对正类和负类获得0.75和0.56的F度量。结果鼓励我们继续进行研究，并对资源和系统进行进一步的尝试。基于词典的文档级别极性分类系统分别对正类和负类获得0.75和0.56的F度量。结果鼓励我们继续进行研究，并对资源和系统进行进一步的尝试。基于词典的文档级别极性分类系统分别对正类和负类获得0.75和0.56的F度量。结果鼓励我们继续进行研究，并对资源和系统进行进一步的尝试。

更新日期：2021-05-26

点击分享查看原文

点击收藏

阅读更多本刊最新论文