当前位置: X-MOL 学术Int. J. Inf. Technol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Building semantically annotated corpus for text classification of Indian defence news articles
International Journal of Information Technology Pub Date : 2021-06-17 , DOI: 10.1007/s41870-021-00679-x
Saurabh A. Kanekar , Alind Sharma , Gaurang S. Patkar , Amey K. Shet Tilve

A large amount of textual data is generated online with rapid growth and technological advancement. Deriving interesting patterns like opinions, summaries and facts from the text data is a challenging task. Currently, there is no dataset for subjectivity/objectivity classification data in Indian National Security domain. A News dataset has been created for purpose of subjective/objective sentence classification. This paper defines the news corpus annotation guidelines and employs an inter-annotator agreement metric to assess the quality of the dataset. The proposed methodology also highlights different challenges and limitations of building a corpus in the National Security domain. The corpus can be utilized for research work in developing robust subjective/objective sentence classifier. Furthermore, text categorization experiments are conducted on corpus, demonstrates that neural network based classifier gives promising result.



中文翻译:

为印度国防新闻文章的文本分类构建语义注释语料库

随着快速增长和技术进步,在线产生了大量的文本数据。从文本数据中导出有趣的模式,如意见、总结和事实,是一项具有挑战性的任务。目前,印度国家安全领域没有主观/客观分类数据的数据集。为了主观/客观句子分类的目的,已经创建了一个新闻数据集。本文定义了新闻语料库注释指南,并采用注释者间一致性度量来评估数据集的质量。所提出的方法还强调了在国家安全领域建立语料库的不同挑战和局限性。该语料库可用于开发强大的主观/客观句子分类器的研究工作。此外,

更新日期:2021-06-18
down
wechat
bug