当前位置: X-MOL 学术Pattern Recogn. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Sparse Regularized Low-Rank Tensor Regression with Applications in Genomic Data Analysis
Pattern Recognition ( IF 7.5 ) Pub Date : 2020-11-01 , DOI: 10.1016/j.patcog.2020.107516
Le Ou-Yang , Xiao-Fei Zhang , Hong Yan

Abstract Many applications in biomedical informatics deal with data in the tensor form. Traditional regression methods which take vectors as covariates may encounter difficulties in handling tensors due to their ultrahigh dimensionality and complex structure. In this paper, we introduce a novel sparse regularized Tucker tensor regression model to exploit the structure of tensor covariates and perform feature selection on tensor data. Based on Tucker decomposition of the regression coefficient tensor, we reduce the ultrahigh dimensionality to a manageable level. To make our model identifiable, we impose the orthonormality constraint on the factor matrices. Unlike previous tensor regression models that impose sparse penalty on the factor matrices of the coefficient tensor, our model directly imposes sparse penalty on the coefficient tensor to select the relevant features on tensor data. An efficient optimization algorithm based on alternating direction method of multiplier (ADMM) algorithm is designed to solve our proposed model. The performance of our model is evaluated on both synthetic and real genomic data. Experiment results on synthetic data demonstrate that our model could identify the true related signals more accurately than other state-of-the-art regression models. The analysis on genomic data of melanoma demonstrates that our model can achieve better prediction performance and identify markers with important implications. Our model and the associated studies can provide useful insights to disease or pathogenesis mechanisms, and will benefit further studies in variable selection.

中文翻译:

稀疏正则化低秩张量回归在基因组数据分析中的应用

摘要 生物医学信息学中的许多应用都处理张量形式的数据。传统的以向量为协变量的回归方法由于其超高维和复杂的结构可能会在处理张量时遇到困难。在本文中,我们引入了一种新颖的稀疏正则化 Tucker 张量回归模型,以利用张量协变量的结构并对张量数据执行特征选择。基于回归系数张量的塔克分解,我们将超高维降低到可管理的水平。为了使我们的模型可识别,我们对因子矩阵施加正交约束。与之前对系数张量的因子矩阵施加稀疏惩罚的张量回归模型不同,我们的模型直接对系数张量施加稀疏惩罚,以选择张量数据上的相关特征。设计了一种基于乘法器交替方向法(ADMM)算法的高效优化算法来求解我们提出的模型。我们的模型的性能是在合成和真实基因组数据上进行评估的。合成数据的实验结果表明,我们的模型可以比其他最先进的回归模型更准确地识别真正的相关信号。对黑色素瘤基因组数据的分析表明,我们的模型可以实现更好的预测性能并识别具有重要意义的标记。我们的模型和相关研究可以为疾病或发病机制提供有用的见解,并将有利于变量选择的进一步研究。
更新日期:2020-11-01
down
wechat
bug