当前位置: X-MOL 学术Stat. Anal. Data Min. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A family of mixture models for biclustering
Statistical Analysis and Data Mining ( IF 2.1 ) Pub Date : 2021-10-15 , DOI: 10.1002/sam.11555
Wangshu Tu 1 , Sanjeena Subedi 1
Affiliation  

Biclustering is used for simultaneous clustering of the observations and variables when there is no group structure known a priori. It is being increasingly used in bioinformatics, text analytics, and so on. Previously, biclustering has been introduced in a model-based clustering framework by utilizing a structure similar to a mixture of factor analyzers. In such models, observed variables urn:x-wiley:19321864:media:sam11555:sam11555-math-0001 are modeled using a latent variable urn:x-wiley:19321864:media:sam11555:sam11555-math-0002 that is assumed to be from urn:x-wiley:19321864:media:sam11555:sam11555-math-0003. Clustering of variables are introduced by imposing constraints on the entries of the factor loading matrix to be 0 and 1 that results in block diagonal covariance matrices. However, this approach is overly restrictive as off-diagonal elements in the blocks of the covariance matrices can only be 1 which can lead to unsatisfactory model fit on complex data. Here, the latent variable urn:x-wiley:19321864:media:sam11555:sam11555-math-0004 is assumed to be from a urn:x-wiley:19321864:media:sam11555:sam11555-math-0005 where urn:x-wiley:19321864:media:sam11555:sam11555-math-0006 is a diagonal matrix. This ensures that the off-diagonal terms in the block matrices within the covariance matrices are non-zero and not restricted to be 1. This leads to a superior model fit on complex data. A family of models is developed by imposing constraints on the components of the covariance matrix. For parameter estimation, an alternating expectation conditional maximization (AECM) algorithm is used. Finally, the proposed method is illustrated using simulated and real datasets.

中文翻译:

用于双聚类的一系列混合模型

当没有先验已知的组结构时,双聚类用于观察和变量的同时聚类。它越来越多地用于生物信息学、文本分析等。以前,通过利用类似于混合因子分析器的结构,在基于模型的聚类框架中引入了双聚类。在此类模型中,观察到的变量使用假定来自骨灰盒:x-wiley:19321864:媒体:sam11555:sam11555-math-0001的潜在变量进行建模骨灰盒:x-wiley:19321864:媒体:sam11555:sam11555-math-0002骨灰盒:x-wiley:19321864:媒体:sam11555:sam11555-math-0003. 通过对因子加载矩阵的条目施加约束来引入变量聚类,约束为 0 和 1,从而产生块对角协方差矩阵。然而,这种方法过于严格,因为协方差矩阵块中的非对角元素只能为 1,这可能导致模型对复杂数据的拟合不令人满意。这里,骨灰盒:x-wiley:19321864:媒体:sam11555:sam11555-math-0004假设潜在变量来自骨灰盒:x-wiley:19321864:媒体:sam11555:sam11555-math-0005骨灰盒:x-wiley:19321864:媒体:sam11555:sam11555-math-0006是对角矩阵。这确保了协方差矩阵内的块矩阵中的非对角项非零且不限于 1。这导致了对复杂数据的出色模型拟合。通过对协方差矩阵的分量施加约束来开发一系列模型。对于参数估计,使用交替期望条件最大化 (AECM) 算法。最后,使用模拟和真实数据集说明了所提出的方法。
更新日期:2021-10-15
down
wechat
bug