当前位置: X-MOL 学术J. Proteome Res. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Deep learning accurately predicts estrogen receptor status in breast cancer metabolomics data
Journal of Proteome Research ( IF 4.4 ) Pub Date : 2017-11-07 00:00:00 , DOI: 10.1021/acs.jproteome.7b00595
Fadhl M Alakwaa 1 , Kumardeep Chaudhary 1 , Lana X Garmire 1, 2
Affiliation  

Metabolomics holds the promise as a new technology to diagnose highly heterogeneous diseases. Conventionally, metabolomics data analysis for diagnosis is done using various statistical and machine learning based classification methods. However, it remains unknown if the deep neural network, a class of increasingly popular machine learning methods, is suitable to classify metabolomics data. Here we use a cohort of 271 breast cancer tissues, 204 positive estrogen receptor (ER+) and 67 negative estrogen receptor (ER-), to test the accuracies of autoencoder, a deep learning (DL) framework, as well as six widely used machine learning models, namely Random Forest (RF), Support Vector Machines (SVM), Recursive Partitioning and Regression Trees (RPART), Linear Discriminant Analysis (LDA), Prediction Analysis for Microarrays (PAM), and Generalized Boosted Models (GBM). DL framework has the highest area under the curve (AUC) of 0.93 in classifying ER+/ER- patients, compared to the other six machine learning algorithms. Furthermore, the biological interpretation of the first hidden layer reveals eight commonly enriched significant metabolomics pathways (adjusted P-value<0.05) that cannot be discovered by other machine learning methods. Among them, protein digestion & absorption and ATP-binding cassette (ABC) transporters pathways are also confirmed in integrated analysis between metabolomics and gene expression data in these samples. In summary, deep learning method shows advantages for metabolomics based breast cancer ER status classification, with both the highest prediction accuracy (AUC=0.93) and better revelation of disease biology. We encourage the adoption of autoencoder based deep learning method in the metabolomics research community for classification.

中文翻译:

深度学习准确预测乳腺癌代谢组学数据中的雌激素受体状态

代谢组学有望成为诊断高度异质性疾病的新技术。传统上,用于诊断的代谢组学数据分析是使用各种基于统计和机器学习的分类方法来完成的。然而,深度神经网络这种日益流行的机器学习方法是否适合对代谢组学数据进行分类仍然未知。在这里,我们使用 271 个乳腺癌组织、204 个阳性雌激素受体 (ER+) 和 67 个阴性雌激素受体 (ER-) 组成的队列来测试自动编码器、深度学习 (DL) 框架以及六种广泛使用的机器的准确性学习模型,即随机森林 (RF)、支持向量机 (SVM)、递归分区和回归树 (RPART)、线性判别分析 (LDA)、微阵列预测分析 (PAM) 和广义提升模型 (GBM)。与其他六种机器学习算法相比,深度学习框架在对 ER+/ER- 患者进行分类时具有最高的曲线下面积 (AUC),为 0.93。此外,第一隐藏层的生物学解释揭示了其他机器学习方法无法发现的八种通常丰富的显着代谢组学途径(调整后的 P 值<0.05)。其中,蛋白质消化吸收和ATP结合盒(ABC)转运蛋白途径也在这些样品中代谢组学和基因表达数据的综合分析中得到证实。综上所述,深度学习方法在基于代谢组学的乳腺癌 ER 状态分类方面显示出优势,具有最高的预测精度(AUC=0.93)和更好的疾病生物学揭示。我们鼓励代谢组学研究社区采用基于自动编码器的深度学习方法进行分类。
更新日期:2017-11-07
down
wechat
bug