Biclustering via structured regularized matrix decomposition,Statistics and Computing

当前位置： X-MOL 学术 › Stat. Comput. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Biclustering via structured regularized matrix decomposition
Statistics and Computing ( IF 1.6 ) Pub Date : 2022-04-29 , DOI: 10.1007/s11222-022-10095-1
Yan Zhong ₁ , Jianhua Z. Huang ₂

Affiliation

Biclustering is a machine learning problem that deals with simultaneously clustering of rows and columns of a data matrix. Complex structures of the data matrix such as overlapping biclusters have challenged existing methods. In this paper, we first provide a unified formulation of biclustering that uses structured regularized matrix decomposition, which synthesizes various existing methods, and then develop a new biclustering method called BCEL based on this formulation. The biclustering problem is formulated as a penalized least-squares problem that approximates the data matrix \(\mathbf {X}\) by a multiplicative matrix decomposition \(\mathbf {U}\mathbf {V}^T\) with sparse columns in both \(\mathbf {U}\) and \(\mathbf {V}\). The squared \(\ell _{1,2}\)-norm penalty, also called the exclusive Lasso penalty, is applied to both \(\mathbf {U}\) and \(\mathbf {V}\) to assist identification of rows and columns included in the biclusters. The penalized least-squares problem is solved by a novel computational algorithm that combines alternating minimization and the proximal gradient method. A subsampling based procedure called stability selection is developed to select the tuning parameters and determine the bicluster membership. BCEL is shown to be competitive to existing methods in simulation studies and an application to a real-world single-cell RNA sequencing dataset.

中文翻译：

通过结构化正则化矩阵分解进行双聚类

双聚类是一个机器学习问题，它同时处理数据矩阵的行和列的聚类。数据矩阵的复杂结构（例如重叠双簇）对现有方法提出了挑战。在本文中，我们首先提供了一个使用结构化正则化矩阵分解的双聚类统一公式，它综合了现有的各种方法，然后在此公式的基础上开发了一种新的双聚类方法，称为 BCEL。双聚类问题被表述为一个惩罚最小二乘问题，它通过具有稀疏列的乘法矩阵分解\(\mathbf {U}\mathbf {V}^T\)来近似数据矩阵\(\mathbf {X}\)在\(\mathbf {U}\)和\(\mathbf {V}\). 平方\(\ell _{1,2}\)范数惩罚，也称为排他 Lasso 惩罚，适用于\(\mathbf {U}\)和\(\mathbf {V}\)以协助识别包含在双簇中的行和列。惩罚最小二乘问题通过一种结合交替最小化和近端梯度法的新型计算算法来解决。开发了一种称为稳定性选择的基于子采样的程序来选择调整参数并确定双聚类成员资格。BCEL 被证明与模拟研究中的现有方法和现实世界单细胞 RNA 测序数据集的应用相比具有竞争力。

更新日期：2022-04-29

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11