To read this content please select one of the options below:

Incremental learning for text categorization using rough set boundary based optimized Support Vector Neural Network

N. Venkata Sailaja (VNR Vignana Jyothi Institute of Engineering and Technology, Hyderabad, India)
L. Padmasree (VNR Vignana Jyothi Institute of Engineering and Technology, Hyderabad, India)
N. Mangathayaru (VNR Vignana Jyothi Institute of Engineering and Technology, Hyderabad, India)

Data Technologies and Applications

ISSN: 2514-9288

Article publication date: 2 July 2020

Issue publication date: 2 November 2020

175

Abstract

Purpose

Text mining has been used for various knowledge discovery based applications, and thus, a lot of research has been contributed towards it. Latest trending research in the text mining is adopting the incremental learning data, as it is economical while dealing with large volume of information.

Design/methodology/approach

The primary intention of this research is to design and develop a technique for incremental text categorization using optimized Support Vector Neural Network (SVNN). The proposed technique involves four major steps, such as pre-processing, feature selection, classification and feature extraction. Initially, the data is pre-processed based on stop word removal and stemming. Then, the feature extraction is done by extracting semantic word-based features and Term Frequency and Inverse Document Frequency (TF-IDF). From the extracted features, the important features are selected using Bhattacharya distance measure and the features are subjected as the input to the proposed classifier. The proposed classifier performs incremental learning using SVNN, wherein the weights are bounded in a limit using rough set theory. Moreover, for the optimal selection of weights in SVNN, Moth Search (MS) algorithm is used. Thus, the proposed classifier, named Rough set MS-SVNN, performs the text categorization for the incremental data, given as the input.

Findings

For the experimentation, the 20 News group dataset, and the Reuters dataset are used. Simulation results indicate that the proposed Rough set based MS-SVNN has achieved 0.7743, 0.7774 and 0.7745 for the precision, recall and F-measure, respectively.

Originality/value

In this paper, an online incremental learner is developed for the text categorization. The text categorization is done by developing the Rough set MS-SVNN classifier, which classifies the incoming texts based on the boundary condition evaluated by the Rough set theory, and the optimal weights from the MS. The proposed online text categorization scheme has the basic steps, like pre-processing, feature extraction, feature selection and classification. The pre-processing is carried out to identify the unique words from the dataset, and the features like semantic word-based features and TF-IDF are obtained from the keyword set. Feature selection is done by setting a minimum Bhattacharya distance measure, and the selected features are provided to the proposed Rough set MS-SVNN for the classification.

Keywords

Citation

Venkata Sailaja, N., Padmasree, L. and Mangathayaru, N. (2020), "Incremental learning for text categorization using rough set boundary based optimized Support Vector Neural Network", Data Technologies and Applications, Vol. 54 No. 5, pp. 585-601. https://doi.org/10.1108/DTA-03-2020-0071

Publisher

:

Emerald Publishing Limited

Copyright © 2020, Emerald Publishing Limited

Related articles