Incremental learning for text categorization using rough set boundary based optimized Support Vector Neural Network,Data Technologies and Applications

当前位置： X-MOL 学术 › Data Technol. Appl. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Incremental learning for text categorization using rough set boundary based optimized Support Vector Neural Network
Data Technologies and Applications ( IF 1.6 ) Pub Date : 2020-07-03 , DOI: 10.1108/dta-03-2020-0071
N. Venkata Sailaja , L. Padmasree , N. Mangathayaru

Purpose

Text mining has been used for various knowledge discovery based applications, and thus, a lot of research has been contributed towards it. Latest trending research in the text mining is adopting the incremental learning data, as it is economical while dealing with large volume of information.

Design/methodology/approach

The primary intention of this research is to design and develop a technique for incremental text categorization using optimized Support Vector Neural Network (SVNN). The proposed technique involves four major steps, such as pre-processing, feature selection, classification and feature extraction. Initially, the data is pre-processed based on stop word removal and stemming. Then, the feature extraction is done by extracting semantic word-based features and Term Frequency and Inverse Document Frequency (TF-IDF). From the extracted features, the important features are selected using Bhattacharya distance measure and the features are subjected as the input to the proposed classifier. The proposed classifier performs incremental learning using SVNN, wherein the weights are bounded in a limit using rough set theory. Moreover, for the optimal selection of weights in SVNN, Moth Search (MS) algorithm is used. Thus, the proposed classifier, named Rough set MS-SVNN, performs the text categorization for the incremental data, given as the input.

Findings

For the experimentation, the 20 News group dataset, and the Reuters dataset are used. Simulation results indicate that the proposed Rough set based MS-SVNN has achieved 0.7743, 0.7774 and 0.7745 for the precision, recall and F-measure, respectively.

Originality/value

In this paper, an online incremental learner is developed for the text categorization. The text categorization is done by developing the Rough set MS-SVNN classifier, which classifies the incoming texts based on the boundary condition evaluated by the Rough set theory, and the optimal weights from the MS. The proposed online text categorization scheme has the basic steps, like pre-processing, feature extraction, feature selection and classification. The pre-processing is carried out to identify the unique words from the dataset, and the features like semantic word-based features and TF-IDF are obtained from the keyword set. Feature selection is done by setting a minimum Bhattacharya distance measure, and the selected features are provided to the proposed Rough set MS-SVNN for the classification.

中文翻译：

使用基于粗糙集边界的优化支持向量神经网络进行文本分类的增量学习

目的

文本挖掘已用于各种基于知识发现的应用程序，因此，许多研究为此做出了贡献。文本挖掘中的最新趋势研究正在采用增量学习数据，因为它在处理大量信息时非常经济。

设计/方法/方法

这项研究的主要目的是设计和开发一种使用优化的支持向量神经网络（SVNN）进行增量文本分类的技术。所提出的技术涉及四个主要步骤，例如预处理，特征选择，分类和特征提取。最初，基于停用词的删除和词干对数据进行预处理。然后，通过提取基于语义词的特征以及术语频率和文档反向频率（TF-IDF）来完成特征提取。从提取的特征中，使用Bhattacharya距离度量选择重要特征，并将这些特征作为拟议分类器的输入。提出的分类器使用SVNN执行增量学习，其中使用粗糙集理论将权重限制在一个极限内。此外，为了在SVNN中优化权重选择，使用了Moth Search（MS）算法。因此，提议的分类器（称为粗糙集MS-SVNN）对增量数据执行文本分类，作为输入。

发现

对于实验，使用了20个新闻组数据集和路透社数据集。仿真结果表明，所提出的基于粗糙集的MS-SVNN在精度，查全率和F-度量上分别达到0.7743、0.7774和0.7745 。

创意/价值

本文为文本分类开发了一个在线增量学习器。文本分类是通过开发粗糙集MS-SVNN分类器完成的，该分类器基于粗糙集理论评估的边界条件以及来自MS的最佳权重，对传入的文本进行分类。提出的在线文本分类方案具有基本步骤，如预处理，特征提取，特征选择和分类。进行预处理以从数据集中识别唯一词，并从关键字集中获得诸如基于语义词的特征和TF-IDF之类的特征。通过设置最小Bhattacharya距离度量来完成特征选择，然后将选定的特征提供给拟议的粗糙集MS-SVNN进行分类。

更新日期：2020-07-03

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>