当前位置: X-MOL 学术Curr. Bioinform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Elastic Net Regularized Softmax Regression Methods for Multi-subtype Classification in Cancer
Current Bioinformatics ( IF 2.4 ) Pub Date : 2020-02-29 , DOI: 10.2174/1574893613666181112141724
Lin Zhang 1 , Yanling He 1 , Haiting Song 2 , Xuesong Wang 1 , Nannan Lu 1 , Lei Sun 3 , Hui Liu 1
Affiliation  

Background: Various regularization methods have been proposed to improve the prediction accuracy in cancer diagnosis. Elastic net regularized logistic regression has been widely adopted for cancer classification and gene selection in genetics and molecular biology but is commonly applied to binary classification and regression. However, usually, the cancer subtypes can be more, and most likely cannot be decided precisely.

Objective: Besides the multi-class issue, the feature selection problem is also a critical problem for cancer subtype classification.

Methods: An Elastic Net Regularized Softmax Regression (ENRSR) for multi-classification is put forward to tackle the multiple classification issue. As an extension of elastic net regularized logistic regression, ENRSR enforces structure sparsity and ‘grouping effect’ for gene selection based on gene expression data, which may exhibit high correlation. The sparsity structure and ‘grouping effect’ help to select more propriate discriminable features for multi-classification.

Result: It is demonstrated that ENRSR gains more accurate and robust performance compared to the other 6 competing algorithms (K-means, Hierarchical Clustering, Expectation Maximization, Nonnegative Matrix Factorization, Support Vector Machine and Random Forest) in predicting cancer subtypes both on simulation data and real cancer gene expression data in terms of F measure.

Conclusion: Our proposed ENRSR method is a reliable regularized softmax regression for multisubtype classification.



中文翻译:

弹性网正则化Softmax回归方法用于癌症多亚型分类

背景:已经提出了各种正则化方法来提高癌症诊断的预测准确性。弹性网正则逻辑回归已广泛用于遗传学和分子生物学中的癌症分类和基因选择,但通常应用于二元分类和回归。但是,通常,癌症亚型可能更多,并且很可能无法准确确定。

目的:除了多类问题外,特征选择问题也是癌症亚型分类的关键问题。

方法:针对多分类问题,提出了一种用于多分类的弹性网正则化Softmax回归(ENRSR)。作为弹性网正则逻辑回归的扩展,ENRSR根据基因表达数据对基因选择实施结构稀疏性和“分组效应”,这可能表现出高度相关性。稀疏结构和“分组效应”有助于为多分类选择更合适的可区分特征。

结果:事实证明,在模拟数据上预测癌症亚型时,ENRSR与其他6种竞争算法(K均值,分层聚类,期望最大化,非负矩阵分解,支持向量机和随机森林)相比,具有更准确和强大的性能。和真实的癌症基因表达数据(以F度量表示)。

结论:我们提出的ENRSR方法是用于多亚型分类的可靠的正则化softmax回归。

更新日期:2020-02-29
down
wechat
bug