当前位置: X-MOL 学术Pattern Recogn. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Penalized -regression-based bicluster localization
Pattern Recognition ( IF 7.5 ) Pub Date : 2021-04-20 , DOI: 10.1016/j.patcog.2021.107984
Hanjia Gao , Zhengjian Bai , Weiguo Gao , Shuqin Zhang

Biclustering (co-clustering, two-mode clustering), as one of the classical unsupervised learning methods, has been applied in many different fields in recent years. Different types of biclustering methods have been developed such as probabilistic methods, two-way clustering methods, variance minimization methods, and so on. However, few regression-based methods have been proposed to the best of our knowledge. Such methods have been applied in traditional clustering, which can improve both the computational efficiency and the clustering accuracy. In this paper, we present a penalized regression-based method for localizing the biclusters (PRbiclust). By imposing Truncated LASSO Penalty (TLP) and group TLP terms to penalize the column vectors and the row vectors in the regression model, the structure of biclusters in the data matrix is recovered. The model is formulated as an optimization problem with nonconvex penalties, and a computationally efficient algorithm is proposed to solve it. Convergence of the algorithm is proved. To extract the biclusters from the recovered data matrix, we propose a graph-based localization method. An evaluation criterion is also proposed to measure the efficiency of bicluster localization when noise entries exist. We apply the proposed method to both simulated datasets with different setups and a real dataset. Experiments show that this method can well capture the bicluster structure, and performs better than the existing works.



中文翻译:

基于惩罚回归的双簇定位

作为经典的无监督学习方法之一,双聚类(共聚,双模式聚类)近年来已在许多不同领域中得到应用。已经开发了不同类型的双重聚类方法,例如概率方法,双向聚类方法,方差最小化方法等。但是,据我们所知,很少有人提出基于回归的方法。此类方法已应用于传统聚类中,可以同时提高计算效率和聚类精度。在本文中,我们提出了一种基于惩罚回归的方法来定位双簇(PRbiclust)。通过强加截断的LASSO罚分(TLP)和组TLP项以对回归模型中的列向量和行向量进行惩罚,可以恢复数据矩阵中的双簇结构。该模型被公式化为具有非凸罚分的优化问题,并提出了一种计算效率高的算法来求解。证明了算法的收敛性。为了从恢复的数据矩阵中提取双峰,我们提出了一种基于图的定位方法。还提出了一种评估标准,用于在存在噪声输入时测量双簇定位的效率。我们将建议的方法应用于具有不同设置的模拟数据集和真实数据集。实验表明,该方法能够很好地捕获双楔结构,并且比现有的方法具有更好的性能。为了从恢复的数据矩阵中提取双峰,我们提出了一种基于图的定位方法。还提出了一种评估标准,用于在存在噪声输入时测量双簇定位的效率。我们将建议的方法应用于具有不同设置的模拟数据集和真实数据集。实验表明,该方法能够很好地捕获双楔结构,并且比现有的方法具有更好的性能。为了从恢复的数据矩阵中提取双峰,我们提出了一种基于图的定位方法。还提出了一种评估标准,用于在存在噪声输入时测量双簇定位的效率。我们将建议的方法应用于具有不同设置的模拟数据集和真实数据集。实验表明,该方法能够很好地捕获双簇结构,并且比现有的方法具有更好的性能。

更新日期:2021-05-04
down
wechat
bug