Short-text feature expansion and classification based on nonnegative matrix factorization,International Journal of Intelligent Systems

当前位置： X-MOL 学术 › Int. J. Intell. Syst. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Short-text feature expansion and classification based on nonnegative matrix factorization
International Journal of Intelligent Systems ( IF 7 ) Pub Date : 2020-09-21 , DOI: 10.1002/int.22290
Ling Zhang ₁ , Wenchao Jiang ₁ , Zhiming Zhao ₂

Affiliation

In this paper, a non-negative matrix factorization feature expansion (NMFFE) approach was proposed to overcome the feature-sparsity issue when expanding features of short-text. First, we took the internal relationships of short texts and words into account when segmenting words from texts and constructing their relationship matrix. Second, we utilized the Dual regularization non-negative matrix tri-factorization (DNMTF) algorithm to obtain the words clustering indicator matrix, which was used to get the feature space by dimensionality reduction methods. Thirdly, words with close relationship were selected out from the feature space and added into the short-text to solve the sparsity issue. The experimental results showed that the accuracy of short text classification of our NMFFE algorithm increased 25.77%, 10.89%, and 1.79% on three data sets: Web snippets, Twitter sports, and AGnews, respectively compared with the Word2Vec algorithm and Char-CNN algorithm. It indicated that the NMFFE algorithm was better than the BOW algorithm and the Char-CNN algorithm in terms of classification accuracy and algorithm robustness.

中文翻译：

基于非负矩阵分解的短文本特征扩展与分类

在本文中，提出了一种非负矩阵分解特征扩展（NMFFE）方法来克服在扩展短文本特征时的特征稀疏性问题。首先，我们在从文本中分割单词并构建它们的关系矩阵时考虑了短文本和单词的内部关系。其次，我们利用对偶正则化非负矩阵三因子分解（DNMTF）算法获得词聚类指标矩阵，该矩阵用于通过降维方法获得特征空间。第三，从特征空间中挑选出关系密切的词加入到短文本中，以解决稀疏性问题。实验结果表明，我们的 NMFFE 算法的短文本分类准确率在三个数据集上分别提高了 25.77%、10.89% 和 1.79%：Web snippets、Twitter sports、AGnews，分别对比了Word2Vec算法和Char-CNN算法。表明NMFFE算法在分类精度和算法鲁棒性方面优于BOW算法和Char-CNN算法。

更新日期：2020-09-21

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>