Skip to main content
Log in

Model-Based and Model-Free Techniques for Amyotrophic Lateral Sclerosis Diagnostic Prediction and Patient Clustering

  • Original Article
  • Published:
Neuroinformatics Aims and scope Submit manuscript

Abstract

Amyotrophic lateral sclerosis (ALS) is a complex progressive neurodegenerative disorder with an estimated prevalence of about 5 per 100,000 people in the United States. In this study, the ALS disease progression is measured by the change of Amyotrophic Lateral Sclerosis Functional Rating Scale (ALSFRS) score over time. The study aims to provide clinical decision support for timely forecasting of the ALS trajectory as well as accurate and reproducible computable phenotypic clustering of participants. Patient data are extracted from DREAM-Phil Bowen ALS Prediction Prize4Life Challenge data, most of which are from the Pooled Resource Open-Access ALS Clinical Trials Database (PRO-ACT) archive. We employed model-based and model-free machine-learning methods to predict the change of the ALSFRS score over time. Using training and testing data we quantified and compared the performance of different techniques. We also used unsupervised machine learning methods to cluster the patients into separate computable phenotypes and interpret the derived subcohorts. Direct prediction of univariate clinical outcomes based on model-based (linear models) or model-free (machine learning based techniques – random forest and Bayesian adaptive regression trees) was only moderately successful. The correlation coefficients between clinically observed changes in ALSFRS scores relative to the model-based/model-free predicted counterparts were 0.427 (random forest) and 0.545(BART). The reliability of these results were assessed using internal statistical cross validation and well as external data validation. Unsupervised clustering generated very reliable and consistent partitions of the patient cohort into four computable phenotypic subgroups. These clusters were explicated by identifying specific salient clinical features included in the PRO-ACT archive that discriminate between the derived subcohorts. There are differences between alternative analytical methods in forecasting specific clinical phenotypes. Although predicting univariate clinical outcomes may be challenging, our results suggest that modern data science strategies are useful in clustering patients and generating evidence-based ALS hypotheses about complex interactions of multivariate factors. Predicting univariate clinical outcomes using the PRO-ACT data yields only marginal accuracy (about 70%). However, unsupervised clustering of participants into sub-groups generates stable, reliable and consistent (exceeding 95%) computable phenotypes whose explication requires interpretation of multivariate sets of features.

Highlights

• Used a large ALS data archive of 8,000 patients consisting of 3 million records, including 200 clinical features tracked over 12 months.

• Employed model-based and model-free methods to predict ALSFRS changes over time, cluster patients into cohorts, and derive computable phenotypes.

• Research findings include stable, reliable, and consistent (95%) patient stratification into computable phenotypes. However, clinical explication of the results requires interpretation of multivariate information.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  • Abayomi, K., Gelman, A., & Levy, M. (2008). Diagnostics for multivariate imputations. Journal of the Royal Statistical Society: Series C (Applied Statistics), 57(3), 273–291.

    Article  Google Scholar 

  • Allen-Zhu, Z., & Hazan, E. (2016). Variance reduction for faster non-convex optimization. in International Conference on Machine Learning.

  • Atassi, N., Berry, J., Shui, A., Zach, N., Sherman, A., Sinani, E., Walker, J., Katsovskiy, I., Schoenfeld, D., Cudkowicz, M., & Leitner, M. (2014). The PRO-ACT database design, initial analyses, and predictive features. Neurology, 83(19), 1719–1725.

    Article  CAS  Google Scholar 

  • Beaulieu-Jones, B.K., & Moore, J.H. (2017). Missing data imputation in the electronic health record using deeply learned autoencoders, in Pacific Symposium on Biocomputing 2017, R.B. Altman, et al., Editors. p. 207–218.

  • Bergsma, W., Croon, M.A., & Hagenaars, J.A. (2009). Marginal models: For dependent, clustered, and longitudinal categorical data. Springer Science & Business Media.

  • Bubeck, S. (2015). Convex optimization: Algorithms and complexity. Foundations and Trends® in Machine Learning, 8(3–4), 231–357.

    Article  Google Scholar 

  • Buuren, S., & Groothuis-Oudshoorn, K. (2011). Mice: Multivariate imputation by chained equations in R. Journal of statistical software, 45(3).

  • Carreiro, A. V., Amaral, P. M. T., Pinto, S., Tomás, P., de Carvalho, M., & Madeira, S. C. (2015). Prognostic models based on patient snapshots and time windows: Predicting disease progression to assisted ventilation in amyotrophic lateral sclerosis. Journal of biomedical informatics, 58, 133–144.

    Article  Google Scholar 

  • Cedarbaum, J. M., & Stambler, N. (1997). Performance of the amyotrophic lateral sclerosis functional rating scale (ALSFRS) in multicenter clinical trials. Journal of the Neurological Sciences, 152, s1–s9.

    Article  Google Scholar 

  • Cedarbaum, J. M., Stambler, N., Malta, E., Fuller, C., Hilt, D., Thurmond, B., & Nakanishi, A. (1999). The ALSFRS-R: A revised ALS functional rating scale that incorporates assessments of respiratory function. Journal of the neurological sciences, 169(1), 13–21.

    Article  CAS  Google Scholar 

  • Chatterjee, S., & Hadi, A.S. (2015). Regression analysis by example. John Wiley & Sons.

  • De Sa, J.M. (2012). Pattern recognition: concepts, methods and applications. Springer Science & Business Media.

  • Dinov, I. D. (2016). Volume and value of big healthcare data. Journal of Medical Statistics and Informatics, 4(1), 1–7.

    Article  Google Scholar 

  • Dinov, I. D. (2018). Data science and predictive analytics: Biomedical and health applications using R, Springer, Computer Science, https://doi.org/10.1007/978-3-319-72347-1.

  • Dinov, I. D., Heavner, B., Tang, M., Glusman, G., Chard, K., Darcy, M., Madduri, R., Pa, J., Spino, C., Kesselman, C., Foster, I., Deutsch, E. W., Price, N. D., van Horn, J. D., Ames, J., Clark, K., Hood, L., Hampstead, B. M., Dauer, W., & Toga, A. W. (2016). Predictive big data analytics: A study of Parkinson's disease using large, complex, heterogeneous, incongruent, multi-source and incomplete observations. PLoS One, 11(8), e0157077.

    Article  Google Scholar 

  • Edwards, N., Wu, X., & Tseng, C.-W. (2009). An unsupervised, model-free, machine-learning combiner for peptide identifications from tandem mass spectra. Clinical Proteomics, 5(1), 23–36.

    Article  CAS  Google Scholar 

  • Fiedler, M., et al. (2006). Linear optimization problems with inexact data. Springer Science & Business Media.

  • Filzmoser, P., Baumgartner, R., & Moser, E. (1999). A hierarchical clustering method for analyzing functional MR images. Magnetic Resonance Imaging, 17(6), 817–826.

    Article  CAS  Google Scholar 

  • Franchignoni, F., Mora, G., Giordano, A., Volanti, P., & Chiò, A. (2013). Evidence of multidimensionality in the ALSFRS-R scale: A critical appraisal on its measurement properties using Rasch analysis. Journal of Neurology, Neurosurgery, and Psychiatry, 84(12), 1340–1345.

    Article  Google Scholar 

  • Gomeni, R., Fava, M., & P.R.O.-A.A.C.T. Consortium. (2014). Amyotrophic lateral sclerosis disease progression model. Amyotrophic Lateral Sclerosis and Frontotemporal Degeneration, 15(1–2), 119–129.

    Article  Google Scholar 

  • Gong, P., et al. (2013). A general iterative shrinkage and thresholding algorithm for non-convex regularized optimization problems. in International Conference on Machine Learning.

  • Gordon, P. H., Cheng, B., Salachas, F., Pradat, P. F., Bruneteau, G., Corcia, P., Lacomblez, L., & Meininger, V. (2010). Progression in ALS is not linear but is curvilinear. Journal of Neurology, 257(10), 1713–1717.

    Article  Google Scholar 

  • Grigull, L., et al. (2016). Diagnostic support for selected neuromuscular diseases using answer-pattern recognition and data mining techniques: A proof of concept multicenter prospective trial. BMC Medical Informatics and Decision Making, 16(1), 1.

    Article  Google Scholar 

  • Hothorn, T., & Jung, H. H. (2014). RandomForest4Life: A random Forest for predicting ALS disease progression. Amyotrophic Lateral Sclerosis and Frontotemporal Degeneration, 15(5–6), 444–452.

    Article  Google Scholar 

  • Huang, Z., Zhang, H., Boss, J., Goutman, S. A., Mukherjee, B., Dinov, I. D., Guan, Y., & for the Pooled Resource Open-Access ALS Clinical Trials Consortium. (2017). Complete hazard ranking to analyze right-censored data: An ALS survival study. PLOS Computational Biology, 13(12), e1005887.

    Article  Google Scholar 

  • Jain, A. K. (2010). Data clustering: 50 years beyond K-means. Pattern Recognition letters, 31(8), 651–666.

    Article  Google Scholar 

  • Jain, P., & Kar, P. (2017). Non-convex optimization for machine learning. Foundations and Trends® in Machine Learning, 10(3–4), 142–336.

    Article  Google Scholar 

  • Kai-Hsiang, C., et al. (1999). Model-free functional MRI analysis using Kohonen clustering neural network and fuzzy C-means. IEEE Transactions on Medical Imaging, 18(12), 1117–1128.

    Article  Google Scholar 

  • Kuffner, R., et al. (2015). Crowdsourced analysis of clinical trial data to predict amyotrophic lateral sclerosis progression. Nature Biotechnology, 33(1), 51–57.

    Article  Google Scholar 

  • Maaten, L.v.d., & Hinton, G. (2008). Visualizing data using t-SNE. Journal of Machine Learning Research, 9(Nov), 2579–2605.

  • Mairal, J. (2015). Incremental majorization-minimization optimization with application to large-scale machine learning. SIAM Journal on Optimization, 25(2), 829–855.

    Article  Google Scholar 

  • Marek, K., Jennings, D., Lasch, S., Siderowf, A., Tanner, C., Simuni, T., Coffey, C., Kieburtz, K., Flagg, E., Chowdhury, S., Poewe, W., Mollenhauer, B., Klinik, P. E., Sherer, T., Frasier, M., Meunier, C., Rudolph, A., Casaceli, C., Seibyl, J., Mendick, S., Schuff, N., Zhang, Y., Toga, A., Crawford, K., Ansbach, A., de Blasio, P., Piovella, M., Trojanowski, J., Shaw, L., Singleton, A., Hawkins, K., Eberling, J., Brooks, D., Russell, D., Leary, L., Factor, S., Sommerfeld, B., Hogarth, P., Pighetti, E., Williams, K., Standaert, D., Guthrie, S., Hauser, R., Delgado, H., Jankovic, J., Hunter, C., Stern, M., Tran, B., Leverenz, J., Baca, M., Frank, S., Thomas, C. A., Richard, I., Deeley, C., Rees, L., Sprenger, F., Lang, E., Shill, H., Obradov, S., Fernandez, H., Winters, A., Berg, D., Gauss, K., Galasko, D., Fontaine, D., Mari, Z., Gerstenhaber, M., Brooks, D., Malloy, S., Barone, P., Longo, K., Comery, T., Ravina, B., Grachev, I., Gallagher, K., Collins, M., Widnell, K. L., Ostrowizki, S., Fontoura, P., Ho, T., Luthman, J., Brug, M. . ., Reith, A. D., & Taylor, P. (2011). The Parkinson progression marker initiative (PPMI). Progress in Neurobiology, 95(4), 629–635.

    Article  Google Scholar 

  • Markus, K. A. (2012). Principles and practice of structural equation modeling by Rex B. Kline. Structural Equation Modeling: A Multidisciplinary Journal, 19(3), 509–512.

    Article  Google Scholar 

  • Moon, S. W., et al. (2015a). Structural neuroimaging genetics interactions in Alzheimer’s disease. Journal of Alzheimer's Disease, 48(4), 1051–1063.

    Article  CAS  Google Scholar 

  • Moon, S. W., Dinov, I. D., Hobel, S., Zamanyan, A., Choi, Y. C., Shi, R., Thompson, P. M., Toga, A. W., & for the Alzheimer's Disease Neuroimaging Initiative. (2015b). Structural brain changes in early-onset Alzheimer's disease subjects using the LONI pipeline environment. Journal of Neuroimaging, 25(5), 728–737.

    Article  Google Scholar 

  • Ong, M.-L., Tan, P. F., & Holbrook, J. D. (2017). Predicting functional decline and survival in amyotrophic lateral sclerosis. PLoS One, 12(4), e0174925.

    Article  Google Scholar 

  • Pfohl, S. R., Kim, R. B., Coan, G. S., & Mitchell, C. S. (2018). Unraveling the complexity of amyotrophic lateral sclerosis survival prediction. Frontiers in Neuroinformatics, 12(36).

  • Rodriguez-Galiano, V., et al. (2012). An assessment of the effectiveness of a random forest classifier for land-cover classification. ISPRS Journal of Photogrammetry and Remote Sensing, 67, 93–104.

    Article  Google Scholar 

  • Saitta, S., Kripakaran, P., Raphael, B., & Smith, I. F. C. (2010). Feature selection using stochastic search: An application to system identification. Journal of Computing in Civil Engineering, 24(1), 3–10.

    Article  Google Scholar 

  • Saykin, A. J., Shen, L., Yao, X., Kim, S., Nho, K., Risacher, S. L., Ramanan, V. K., Foroud, T. M., Faber, K. M., Sarwar, N., Munsie, L. M., Hu, X., Soares, H. D., Potkin, S. G., Thompson, P. M., Kauwe, J. S., Kaddurah-Daouk, R., Green, R. C., Toga, A. W., Weiner, M. W., & Alzheimer's Disease Neuroimaging Initiative. (2015). Genetic studies of quantitative MCI and AD phenotypes in ADNI: Progress, opportunities, and plans. Alzheimers & Dementia, 11(7), 792–814.

    Article  Google Scholar 

  • Steinberg, D., & Colla, P. (2009). Cart: classification and regression trees. The Top Ten Algorithms in Data Mining, 9, 179.

    Article  Google Scholar 

  • Su, Y.-S., et al. (2011). Multiple imputation with diagnostics (mi) in R: Opening windows into the black box. Journal of Statistical Software, 45(2), 1–31.

    Article  Google Scholar 

  • Tamás Kincses, Z., Johansen-Berg, H., Tomassini, V., Bosnell, R., Matthews, P. M., & Beckmann, C. F. (2008). Model-free characterization of brain functional networks for motor sequence learning using fMRI. NeuroImage, 39(4), 1950–1958.

    Article  Google Scholar 

  • Taylor, A. A., Fournier, C., Polak, M., Wang, L., Zach, N., Keymer, M., Glass, J. D., Ennist, D. L., & The Pooled Resource Open-Access ALS Clinical Trials Consortium. (2016). Predicting disease progression in amyotrophic lateral sclerosis. Annals of Clinical and Translational Neurology, 3(11), 866–875.

    Article  Google Scholar 

  • Westeneng, H.-J., Debray, T. P. A., Visser, A. E., van Eijk, R. P. A., Rooney, J. P. K., Calvo, A., Martin, S., McDermott, C. J., Thompson, A. G., Pinto, S., Kobeleva, X., Rosenbohm, A., Stubendorff, B., Sommer, H., Middelkoop, B. M., Dekker, A. M., van Vugt, J. J. F. A., van Rheenen, W., Vajda, A., Heverin, M., Kazoka, M., Hollinger, H., Gromicho, M., Körner, S., Ringer, T. M., Rödiger, A., Gunkel, A., Shaw, C. E., Bredenoord, A. L., van Es, M. A., Corcia, P., Couratier, P., Weber, M., Grosskreutz, J., Ludolph, A. C., Petri, S., de Carvalho, M., van Damme, P., Talbot, K., Turner, M. R., Shaw, P. J., al-Chalabi, A., Chiò, A., Hardiman, O., Moons, K. G. M., Veldink, J. H., & van den Berg, L. H. (2018). Prognosis for patients with amyotrophic lateral sclerosis: Development and validation of a personalised prediction model. The Lancet Neurology, 17(5), 423–433.

    Article  Google Scholar 

  • Wismüller, A., Meyer-Bäse, A., Lange, O., Auer, D., Reiser, M. F., & Sumners, D. W. (2004). Model-free functional MRI analysis based on unsupervised clustering. Journal of Biomedical Informatics, 37(1), 10–18.

    Article  Google Scholar 

  • Wistuba, M., Schilling, N., & Schmidt-Thieme, L.. (2015). Sequential model-free Hyperparameter tuning. in Data mining (ICDM), 2015 IEEE International Conference on.

  • Witten, I.H., & Frank, E. (2005). Data mining: Practical machine learning tools and techniques. Morgan Kaufmann.

  • Zach, N., Ennist, D. L., Taylor, A. A., Alon, H., Sherman, A., Kueffner, R., Walker, J., Sinani, E., Katsovskiy, I., Cudkowicz, M., & Leitner, M. L. (2015). Being PRO-ACTive: What can a clinical trial database reveal about ALS? Neurotherapeutics, 12(2), 417–423.

    Article  CAS  Google Scholar 

  • Zhang, G. P. (2000). Neural networks for classification: A survey. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 30(4), 451–462.

    Article  Google Scholar 

Download references

Acknowledgements

Colleagues from the Statistics Online Computational Resource (SOCR), Center for Complexity and Self-management of Chronic Disease (CSCD), Big Data Discovery Science (BDDS), and the Michigan Institute for Data Science (MIDAS) provided constructive feedback about this study.

Data used in the preparation of this article were obtained from the Pooled Resource Open-Access ALS Clinical Trials (PRO-ACT) Database. As such, the following organizations and individuals within the PRO-ACT Consortium contributed to the design and implementation of the PRO-ACT Database and/or provided data, but did not participate in the analysis of the data or the writing of this report: Neurological Clinical Research Institute, MGH; Northeast ALS Consortium; Novartis; Prize4Life Israel; Regeneron Pharmaceuticals, Inc.; Sanofi; Teva Pharmaceutical Industries, Ltd.

Finally, the authors are deeply indebted to the journal editors and the anonymous reviewers who provided valuable recommendations and constructive critiques that improved the manuscript.

Funding

This research was partially supported by NSF grants 1734853, 1636840, 1416953, 0716055 and 1023115, NIH grants P20 NR015331, P50 NS091856, UL1TR002240, P30 DK089503, U54 EB020406, P30 AG053760, and K23 ES027221, and the Elsie Andresen Fiske Research Fund. These funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Author information

Authors and Affiliations

Authors

Contributions

MT: developed techniques, conducted analyses, and wrote manuscript.

CG: developed techniques, conducted analyses, and wrote manuscript.

SAG: conceptualized the study and wrote manuscript.

AK: informatics, data analytics, and wrote manuscript.

BM: biostatistical methodology and wrote manuscript.

YG: conducted analyses, and wrote manuscript.

IDD: conceptualized the study, developed methods, conducted analyses, and wrote manuscript.

Corresponding author

Correspondence to Ivo D. Dinov.

Ethics declarations

Ethics Approval and Consent to Participate

University of Michigan Institutional Review Board (IRB) approval (HUM00115107) was obtained prior to managing, processing and analyzing the PRO-ACT data.

Competing Interests

S.A.G. Dr. Goutman has received research support from the NIH/NIEHS (K23ES027221), Agency for Toxic Substances and Disease Registry/Centers for Disease Control, the ALS Association, Target ALS, Cytokinetics, and Neuralstem, Inc., and consulted for Cytokinetics.

Electronic supplementary material

ESM 1

(DOCX 433 kb)

ESM 2

(PDF 77 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tang, M., Gao, C., Goutman, S.A. et al. Model-Based and Model-Free Techniques for Amyotrophic Lateral Sclerosis Diagnostic Prediction and Patient Clustering. Neuroinform 17, 407–421 (2019). https://doi.org/10.1007/s12021-018-9406-9

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12021-018-9406-9

Keywords

Navigation