当前位置: X-MOL 学术Inf. Process. Manag. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A novel multi-view clustering approach via proximity-based factorization targeting structural maintenance and sparsity challenges for text and image categorization
Information Processing & Management ( IF 7.4 ) Pub Date : 2021-03-13 , DOI: 10.1016/j.ipm.2021.102546
Monika Bansal , Dolly Sharma

Multi-view data contains a set of features representing different perspectives associated with the same data and this phenomenon can be commonly observed in real-world applications. Multi-view clustering in terms of text and image data faces substantial challenges such as Structure-preserving and Sparsity. Existing methods do not conserve the structure of data space and the recent improvements have earmarked only the local layout. Preserving the local structure of data space is not sufficient to handle sparsity in these data. In this paper, we propose a novel clustering approach, called Proximity-based Multi-View Non-negative Matrix Factorization (PMVNMF), which utilizes both the local and global structure of data space conjointly to handle sparsity in real-world multimedia (text and image) data. For each view, the 1-step and 2-step transition probability matrices as the first-order and second-order proximity matrices are constructed to uncover their respective latent local and global geometric structures. Then, view-specific proximity matrices as an integration of the above two types of proximity matrices are constructed. Eventually, Non-negative Matrix Factorization (NMF) is explored via graph regularization and consensus regularization, to consider the obtained integrated graph structures as well as to disclose the indistinct common structure shared by all representations. The algorithm can capture elementary structure of data space and is robust to sparse data. We conduct experiments on six real-world datasets including two text and four image datasets; and compare the performance of the proposed algorithm with eight baseline approaches. Six evaluation metrics including accuracy, f-score, precision, recall, NMI, and entropy are employed to evaluate the performance of algorithm. The results show the outperformance of proposed algorithm over baselines.



中文翻译:

通过基于接近度的因式分解的新颖多视图聚类方法,针对文本和图像分类的结构维护和稀疏性挑战

多视图数据包含一组功能,这些功能代表与同一数据相关联的不同视角,这种现象通常可以在实际应用中观察到。就文本和图像数据而言,多视图聚类面临诸如结构保留和稀疏性之类的严峻挑战。现有方法不能节省数据空间的结构,并且最近的改进仅指定了局部布局。保留数据空间的本地结构不足以处理这些数据中的稀疏性。在本文中,我们提出了一种新颖的聚类方法,称为基于邻近的多视图非负矩阵分解(PMVNMF),该方法联合利用数据空间的局部和全局结构来处理现实多媒体(文本和文本)中的稀疏性。图片)数据。对于每个视图,构造一阶和二阶跃迁概率矩阵作为一阶和二阶接近矩阵,以揭示它们各自的潜在局部和整体几何结构。然后,构建作为以上两种类型的接近度矩阵的整合的特定于视图的接近度矩阵。最终,通过图正则化和共识正则化探索了非负矩阵分解(NMF),以考虑获得的集成图结构以及公开所有表示所共有的不明显的通用结构。该算法可以捕获数据空间的基本结构,并且对稀疏数据具有鲁棒性。我们对六个真实世界的数据集(包括两个文本和四个图像数据集)进行了实验;并将所提出算法的性能与八种基线方法进行比较。采用六个评估指标,包括准确性,f分数,准确性,召回率,NMI和熵,来评估算法的性能。结果表明,所提出算法的性能优于基线。

更新日期:2021-03-15
down
wechat
bug