当前位置: X-MOL 学术Neuroinformatics › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Model-Based and Model-Free Techniques for Amyotrophic Lateral Sclerosis Diagnostic Prediction and Patient Clustering.
Neuroinformatics ( IF 3 ) Pub Date : 2018-11-20 , DOI: 10.1007/s12021-018-9406-9
Ming Tang 1, 2 , Chao Gao 1, 2 , Stephen A Goutman 3 , Alexandr Kalinin 1, 4 , Bhramar Mukherjee 2 , Yuanfang Guan 4 , Ivo D Dinov 1, 4, 5
Affiliation  

Abstract

Amyotrophic lateral sclerosis (ALS) is a complex progressive neurodegenerative disorder with an estimated prevalence of about 5 per 100,000 people in the United States. In this study, the ALS disease progression is measured by the change of Amyotrophic Lateral Sclerosis Functional Rating Scale (ALSFRS) score over time. The study aims to provide clinical decision support for timely forecasting of the ALS trajectory as well as accurate and reproducible computable phenotypic clustering of participants. Patient data are extracted from DREAM-Phil Bowen ALS Prediction Prize4Life Challenge data, most of which are from the Pooled Resource Open-Access ALS Clinical Trials Database (PRO-ACT) archive. We employed model-based and model-free machine-learning methods to predict the change of the ALSFRS score over time. Using training and testing data we quantified and compared the performance of different techniques. We also used unsupervised machine learning methods to cluster the patients into separate computable phenotypes and interpret the derived subcohorts. Direct prediction of univariate clinical outcomes based on model-based (linear models) or model-free (machine learning based techniques – random forest and Bayesian adaptive regression trees) was only moderately successful. The correlation coefficients between clinically observed changes in ALSFRS scores relative to the model-based/model-free predicted counterparts were 0.427 (random forest) and 0.545(BART). The reliability of these results were assessed using internal statistical cross validation and well as external data validation. Unsupervised clustering generated very reliable and consistent partitions of the patient cohort into four computable phenotypic subgroups. These clusters were explicated by identifying specific salient clinical features included in the PRO-ACT archive that discriminate between the derived subcohorts. There are differences between alternative analytical methods in forecasting specific clinical phenotypes. Although predicting univariate clinical outcomes may be challenging, our results suggest that modern data science strategies are useful in clustering patients and generating evidence-based ALS hypotheses about complex interactions of multivariate factors. Predicting univariate clinical outcomes using the PRO-ACT data yields only marginal accuracy (about 70%). However, unsupervised clustering of participants into sub-groups generates stable, reliable and consistent (exceeding 95%) computable phenotypes whose explication requires interpretation of multivariate sets of features.

Highlights

• Used a large ALS data archive of 8,000 patients consisting of 3 million records, including 200 clinical features tracked over 12 months.• Employed model-based and model-free methods to predict ALSFRS changes over time, cluster patients into cohorts, and derive computable phenotypes.• Research findings include stable, reliable, and consistent (95%) patient stratification into computable phenotypes. However, clinical explication of the results requires interpretation of multivariate information.


中文翻译:

基于模型和无模型的肌萎缩侧索硬化症诊断预测和患者聚类技术。

摘要

肌萎缩侧索硬化症 (ALS) 是一种复杂的进行性神经退行性疾病,在美国估计患病率约为每 10 万人中 5 人。在这项研究中,ALS 疾病进展是通过肌萎缩侧索硬化症功能评定量表 (ALSFRS) 评分随时间的变化来衡量的。该研究旨在为及时预测 ALS 轨迹以及准确且可重复的可计算参与者表型聚类提供临床决策支持。患者数据从 DREAM-Phil Bowen ALS 预测 PRIVEST4LIFE 挑战赛数据中提取,其中大部分来自共享资源开放访问 ALS 临床试验数据库 (PRO-ACT) 档案。我们采用基于模型和无模型的机器学习方法来预测 ALSFRS 分数随时间的变化。使用训练和测试数据,我们量化并比较了不同技术的性能。我们还使用无监督的机器学习方法将患者分为单独的可计算表型并解释派生的子队列。基于模型(线性模型)或无模型(基于机器学习的技术——随机森林和贝叶斯自适应回归树)的单变量临床结果的直接预测仅取得了一定的成功。临床观察到的 ALSFRS 评分变化相对于基于模型/无模型预测的对应变化之间的相关系数为 0.427(随机森林)和 0.545(BART)。使用内部统计交叉验证和外部数据验证来评估这些结果的可靠性。无监督聚类将患者队列非常可靠且一致地划分为四个可计算的表型亚组。通过识别 PRO-ACT 档案中包含的区分派生子队列的特定显着临床特征来阐明这些聚类。预测特定临床表型的替代分析方法之间存在差异。尽管预测单变量临床结果可能具有挑战性,但我们的结果表明,现代数据科学策略有助于对患者进行聚类,并生成关于多变量因素复杂相互作用的基于证据的 ALS 假设。使用 PRO-ACT 数据预测单变量临床结果仅产生边际准确度(约 70%)。然而,无监督地将参与者聚类成子组会产生稳定、可靠和一致(超过 95%)的可计算表型,其解释需要解释多变量特征集。

强调

• 使用包含 8,000 名患者的大型 ALS 数据档案,其中包含 300 万条记录,其中包括 12 个月内跟踪的 200 个临床特征。• 采用基于模型和无模型的方法来预测 ALSFRS 随时间的变化,将患者聚类到队列中,并导出可计算的数据表型。• 研究结果包括稳定、可靠且一致 (95%) 的患者分层为可计算的表型。然而,结果的临床解释需要对多变量信息的解释。
更新日期:2018-11-20
down
wechat
bug