当前位置: X-MOL 学术Stat. Comput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Regularized bi-directional co-clustering
Statistics and Computing ( IF 1.6 ) Pub Date : 2021-04-10 , DOI: 10.1007/s11222-021-10006-w
Séverine Affeldt , Lazhar Labiod , Mohamed Nadif

The simultaneous clustering of documents and words, known as co-clustering, has proved to be more effective than one-sided clustering in dealing with sparse high-dimensional datasets. By their nature, text data are also generally unbalanced and directional. Recently, the von Mises–Fisher (vMF) mixture model was proposed to handle unbalanced data while harnessing the directional nature of text. In this paper, we propose a general co-clustering framework based on a matrix formulation of vMF model-based co-clustering. This formulation leads to a flexible framework for text co-clustering that can easily incorporate both word–word semantic relationships and document–document similarities. By contrast with existing methods, which generally use an additive incorporation of similarities, we propose a bi-directional multiplicative regularization that better encapsulates the underlying text data structure. Extensive evaluations on various real-world text datasets demonstrate the superior performance of our proposed approach over baseline and competitive methods, both in terms of clustering results and co-cluster topic coherence.



中文翻译:

正则化双向共聚

在处理稀疏的高维数据集方面,文档和单词的同时聚类(称为共聚)已被证明比单面聚类更有效。就其性质而言,文本数据通常也不平衡且具有方向性。最近,提出了冯·米塞斯·费舍尔(v Misse-Fisher)(vMF)混合模型来处理不平衡数据,同时利用文本的方向性。在本文中,我们提出了一个基于基于vMF模型的共聚矩阵表示的通用共聚框架。这种表述导致了用于文本共聚的灵活框架,该框架可以轻松地合并单词-单词语义关系和文档-文档相似之处。与通常使用相似性的加法合并的现有方法相比,我们提出了一种双向乘法正则化方法,可以更好地封装基础文本数据结构。在各种真实世界的文本数据集上的广泛评估表明,无论是在聚类结果还是在共聚主题的一致性方面,我们提出的方法都比基准方法和竞争方法具有更好的性能。

更新日期:2021-04-11
down
wechat
bug