当前位置: X-MOL 学术Crit. Care › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Demystifying machine learning for mortality prediction
Critical Care ( IF 15.1 ) Pub Date : 2021-12-23 , DOI: 10.1186/s13054-021-03868-z
J M Smit 1, 2 , M E van Genderen 1 , M J T Reinders 2 , D A M P J Gommers 1 , J H Krijthe 2 , J Van Bommel 1
Affiliation  

With interest, we read the article by Banoei et al. [1] on machine learning (ML) models to predict mortality among COVID-19 patients. They refer to other studies that failed to predict mortality using ‘conventional statistical analysis’, after which they present a linear ML model as a better suited method for such complex medical problems. We feel such a claim creates an image around ML as an alternative technique that offers solutions where statistical modeling fails. However, ML and statistical modeling are tightly interwoven. Indeed, there is no consensus on whether or how to differentiate between the two [2, 3], e.g., the approach Banoei and colleagues present (Partial Least Squares) could easily be considered a ‘statistical analysis’ as well. A recent systematic review [4] differentiates ML from statistical models based on how ‘automatically’ these models learn and found no difference in discriminative performance. This raises the question why efforts are made to differentiate between statistical modeling and ML, as it does not provide insights into which prediction models work for which kind of problems. We advocate it is more important to demystify ML by emphasizing its connections to statistical models most clinicians are already familiar with, as it may help in setting reasonable expectations for the potential clinical benefit ML could bring. Towards demystifying ML, good reporting of methodology and findings is essential. Due to unclear or incomplete reporting, one may draw wrong conclusions and miss out on opportunities to learn. For instance, Banoei and colleagues are unclear on how frequently measured predictors (like SpO2) were aggregated to one value. We encourage to follow a unified approach for reporting, e.g. the TRIPOD guidelines [5]. Moreover, it is unfortunately still common that the intended use of prediction models is unclear, whereas it has implications for choices in the model development. Likewise, Banoei and colleagues present several models (for mortality prediction and patient clustering) and suggest that these can ‘aid in clinical decision making and resource allocation’, without further specification. We strongly encourage to develop models with an explicit intended use in mind, enabling fair judgement of their clinical relevance. Altogether, we join Banoei and colleagues in their belief that ML models hold a lot of promise as valuable tools in the modern ICU. However, we advise against differentiating them from statistical models and advocate proper reporting about the methodology and intended use.

Machine learning-based methods vs. conventional statistical methods for studying mortality outcome

  • Mohammad M. Banoei,
  • Roshan Dinparastisaleh,
  • Ali Vaeli Zadeh &
  • Mehdi Mirsaeidi 
  1. Department of Critical Care Medicine, University of Calgary, Alberta, Canada

    Mohammad M. Banoei

  2. Department of Biological Science, University of Calgary, Alberta, Canada

    Mohammad M. Banoei

  3. Division of Pulmonary and Critical Care Medicine, Johns Hopkins University, Baltimore, MD, 21218, USA

    Roshan Dinparastisaleh

  4. Division of Pulmonary and Critical Care, Miami VA Medical Center, Miami, FL, USA

    Ali Vaeli Zadeh

  5. Division of Pulmonary and Critical Care and Sleep Medicine, Department of Medicine, College of Medicine, University of Florida, Jacksonville, FL, USA

    Mehdi Mirsaeidi

In response to the Letter to the Editor by J.M. Smit et al., we certainly agree that machine learning (ML)-based methods can be used interchangeably with multivariate data analysis (MVA) and multivariate prediction models (MPMs) [6]. Although ML approaches like partial least square (PLS), statistically inspired modification of PLS (SIMPLS), random forest (RF), Support Vector Machine (SVM), and artificial neural network (ANN) are considered statistical methods, they are notably different from conventional statistical methods (CSM) [7]. Considering the advantages of ML methods, ML approaches have contributed significantly to the early detection, tracing, diagnosis, prognosis and clinical trials of COVID-19 that have been more functional to support researchers in confronting the coronavirus pandemic [8]. Several studies previously have shown that ML can be more appropriate than CSM for the clinical datasets. ML algorithms have proven better able to stratify COVID-19 patients and mortality risk [9], identify high-risk patients with COVID-19 [10].

It is generally believed that there is no single ML method superior to others. The SIMPLS method was the only prediction model used in our current study. SIMPLS has remarkable advantages including a lower risk of overfitting, a high level of interpretation, a high level of variable selection, easy implementation when compared with RF, SVM and ANN methods [6]. SIMPLS can easily account for batch processing and a high degree of correlation (multicollinearity) between and among variables of large datasets. SIMPLS fit the outcome responses with nominal, continuous, and polynomial data type, interaction and categorial effects, and provides strong visualization when compared with other ML and CSM. In our study, SIMPLS was successfully applied to recognize the most differentiating variables involved in the prediction of COVID-19 mortality. Other statistical methods that are not considered prediction models, such as principal component analysis (PCA) and latent class analysis (LCA) were established based on the findings of SIMPLS to identify patients at the highest risk of dying.

In our study, the importance of ML-based model was its ability to predict patient mortality using variables measured at the time of admission, although these variables were frequently measured during the patients’ hospitalization for other purposes.

Our study mainly focused on the importance of ML in clinical application, and the recognition of the most important variables contributing to mortality from COVID-19, instead of describing the details of ML that may be extraneous for clinicians.

ML:

Machine learning

ICU:

Intensive care unit

COVID-19:

Coronavirus disease 2019

TRIPOD:

Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis

  1. 1.

    Banoei MM, Dinparastisaleh R, Zadeh AV, Mirsaeidi M. Machine-learning-based COVID-19 mortality prediction model and identification of patients at low and high risk of dying. Crit Care (Lond Engl). 2021;25(1):328. https://doi.org/10.1186/s13054-021-03749-5.

    Article Google Scholar

  2. 2.

    Breiman L. Statistical modeling: the two cultures. Qual Eng. 2001;48:81–2.

    Google Scholar

  3. 3.

    Beam AL, Kohane IS. Big data and machine learning in health care. JAMA. 2018;319(13):1317–8. https://doi.org/10.1001/jama.2017.18391.

    Article PubMed Google Scholar

  4. 4.

    Christodoulou E, Ma J, Collins GS, Steyerberg EW, Verbakel JY, Calster BV. Review: a systematic review shows no performance benefit of machine learning over logistic regression for clinical prediction models. J Clin Epidemiol. 2019;110:12–22. https://doi.org/10.1016/j.jclinepi.2019.02.004.

    Article PubMed Google Scholar

  5. 5.

    Collins GS, Reitsma JB, Altman DG, Moons KGM. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD Statement. BMC Med. 2015;13(1):1. https://doi.org/10.1186/s12916-014-0241-z.

    Article PubMed PubMed Central Google Scholar

  6. 6.

    Liebal UW, et al. Machine learning applications for mass spectrometry-based metabolomics. Metabolites. 2020;10(6):243.

    CAS Article Google Scholar

  7. 7.

    Rajula HSR, et al. Comparison of conventional statistical methods with machine learning in medicine: diagnosis, drug development, and treatment. Medicina (Kaunas). 2020;56(9):455.

    Article Google Scholar

  8. 8.

    Rahman MM, et al. A comprehensive study of artificial intelligence and machine learning approaches in confronting the coronavirus (COVID-19) pandemic. Int J Health Serv. 2021;51(4):446–61.

    Article Google Scholar

  9. 9.

    Halasz G, et al. A machine learning approach for mortality prediction in COVID-19 pneumonia: development and evaluation of the Piacenza score. J Med Internet Res. 2021;23(5):e29058.

    Article Google Scholar

  10. 10.

    Quiroz-Juárez MA, et al. Identification of high-risk COVID-19 patients using machine learning. PLoS ONE. 2021;16(9):e0257234.

    Article Google Scholar

Download references

Affiliations

  1. Department of Intensive Care, Erasmus University Medical Center, Rotterdam, Netherlands

    J. M. Smit, M. E. van Genderen, D. A. M. P. J. Gommers & J. Van Bommel

  2. EEMCS, Pattern Recognition and Bio-informatics Group, Delft University of Technology, Delft, Netherlands

    J. M. Smit, M. J. T. Reinders & J. H. Krijthe

Authors
  1. J. M. SmitView author publications

    You can also search for this author in PubMed Google Scholar

  2. M. E. van GenderenView author publications

    You can also search for this author in PubMed Google Scholar

  3. M. J. T. ReindersView author publications

    You can also search for this author in PubMed Google Scholar

  4. D. A. M. P. J. GommersView author publications

    You can also search for this author in PubMed Google Scholar

  5. J. H. KrijtheView author publications

    You can also search for this author in PubMed Google Scholar

  6. J. Van BommelView author publications

    You can also search for this author in PubMed Google Scholar

Contributions

JS drafted the manuscript. MvG, JK, MR and JvB critically reviewed the manuscript. All authors read and approved the final manuscript.

Corresponding authors

Correspondence to J. M. Smit or Mehdi Mirsaeidi.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and Permissions

Verify currency and authenticity via CrossMark

Cite this article

Smit, J.M., van Genderen, M.E., Reinders, M.J.T. et al. Demystifying machine learning for mortality prediction. Crit Care 25, 447 (2021). https://doi.org/10.1186/s13054-021-03868-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s13054-021-03868-z

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative



中文翻译:

揭秘用于死亡率预测的机器学习

我们怀着兴趣阅读了 Banoei 等人的文章。[1] 关于预测 COVID-19 患者死亡率的机器学习 (ML) 模型。他们引用了其他未能使用“传统统计分析”预测死亡率的研究,之后他们提出了一个线性 ML 模型,作为解决此类复杂医学问题的更合适的方法。我们认为这样的主张围绕 ML 创建了一个图像,作为一种替代技术,它提供了统计建模失败的解决方案。然而,机器学习和统计建模是紧密交织在一起的。事实上,对于是否或如何区分这两者没有达成共识[2, 3],例如,Banoei 及其同事提出的方法(偏最小二乘法)也可以很容易地被视为“统计分析”。最近的一项系统评价 [4] 根据这些模型的“自动”学习方式将 ML 与统计模型区分开来,并且发现在判别性能方面没有差异。这就提出了为什么要努力区分统计建模和 ML 的问题,因为它没有提供关于哪种预测模型适用于哪种问题的见解。我们主张通过强调其与大多数临床医生已经熟悉的统计模型的联系来揭开 ML 的神秘面纱更为重要,因为它可能有助于为 ML 可能带来的潜在临床益处设定合理的预期。为了揭开 ML 的神秘面纱,对方法和发现的良好报告至关重要。由于报告不明确或不完整,可能会得出错误的结论并错失学习机会。例如,Banoei 及其同事不清楚测量的预测因子(如 SpO2)被聚合为一个值的频率。我们鼓励遵循统一的报告方法,例如 TRIPOD 指南 [5]。此外,不幸的是,预测模型的预期用途不明确仍然很常见,但它对模型开发中的选择有影响。同样,Banoei 及其同事提出了几个模型(用于死亡率预测和患者聚类),并建议这些模型可以“帮助临床决策和资源分配”,无需进一步说明。我们强烈鼓励开发具有明确预期用途的模型,从而能够公平判断其临床相关性。总而言之,我们与 Banoei 及其同事一起相信,机器学习模型作为现代 ICU 中的有价值工具具有很大的潜力。

基于机器学习的方法与研究死亡率结果的传统统计方法

  • 穆罕默德·M·巴诺伊,
  • Roshan Dinparastisaleh,
  • 阿里·瓦利·扎德 &
  • 迈赫迪·米尔萨伊迪 
  1. 加拿大阿尔伯塔省卡尔加里大学重症监护医学系

    穆罕默德·M·巴诺伊

  2. 加拿大阿尔伯塔省卡尔加里大学生物科学系

    穆罕默德·M·巴诺伊

  3. 约翰霍普金斯大学肺和重症监护医学部,巴尔的摩,马里兰州,21218,美国

    Roshan Dinparastisaleh

  4. 美国佛罗里达州迈阿密迈阿密退伍军人医疗中心肺和重症监护科

    阿里·瓦利·扎德

  5. 美国佛罗里达州杰克逊维尔佛罗里达大学医学院医学系肺、重症监护和睡眠医学科

    迈赫迪·米尔萨伊迪

作为对 JM Smit 等人给编辑的信的回应,我们当然同意基于机器学习 (ML) 的方法可以与多元数据分析 (MVA) 和多元预测模型 (MPM) 互换使用 [6]。尽管偏最小二乘法 (PLS)、PLS 的统计启发修改 (SIMPLS)、随机森林 (RF)、支持向量机 (SVM) 和人工神经网络 (ANN) 等 ML 方法被认为是统计方法,但它们与传统统计方法(CSM)[7]。考虑到 ML 方法的优势,ML 方法对 COVID-19 的早期检测、追踪、诊断、预后和临床试验做出了重大贡献,这些方法更有助于支持研究人员应对冠状病毒大流行 [8]。之前的几项研究表明,对于临床数据集,ML 可能比 CSM 更合适。ML 算法已被证明能够更好地对 COVID-19 患者和死亡风险进行分层 [9],识别 COVID-19 的高风险患者 [10]。

人们普遍认为,没有一种单一的 ML 方法优于其他方法。SIMPLS 方法是我们目前研究中使用的唯一预测模型。与 RF、SVM 和 ANN 方法相比,SIMPLS 具有显着的优势,包括过拟合风险低、解释水平高、变量选择水平高、易于实施 [6]。SIMPLS 可以很容易地解释批处理和大型数据集变量之间的高度相关性(多重共线性)。SIMPLS 将结果响应与名义、连续和多项式数据类型、交互和分类效应相匹配,并且与其他 ML 和 CSM 相比提供了强大的可视化。在我们的研究中,SIMPLS 成功地应用于识别预测 COVID-19 死亡率的最具差异性的变量。

在我们的研究中,基于 ML 的模型的重要性在于它能够使用入院时测量的变量来预测患者死亡率,尽管这些变量经常在患者住院期间出于其他目的进行测量。

我们的研究主要关注 ML 在临床应用中的重要性,以及对导致 COVID-19 死亡率的最重要变量的认识,而不是描述可能对临床医生无关的 ML 细节。

毫升:

机器学习

重症监护室:

重症监护室

新冠肺炎:

2019冠状病毒病

三脚架:

用于个体预后或诊断的多变量预测模型的透明报告

  1. 1.

    Banoei MM, Dinparastisaleh R, Zadeh AV, Mirsaeidi M. 基于机器学习的 COVID-19 死亡率预测模型和低和高死亡风险患者的识别。Crit Care(伦敦)。2021;25(1):328。https://doi.org/10.1186/s13054-021-03749-5。

    文章谷歌学术

  2. 2.

    Breiman L. 统计建模:两种文化。质量工程师。2001;48:81-2。

    谷歌学术

  3. 3.

    梁 AL,小羽 IS。医疗保健中的大数据和机器学习。贾马。2018;319(13):1317-8。https://doi.org/10.1001/jama.2017.18391。

    文章 PubMed 谷歌学术

  4. 4.

    Christodoulou E, Ma J, Collins GS, Steyerberg EW, Verbakel JY, Calster BV。回顾:系统回顾显示机器学习对临床预测模型的逻辑回归没有性能优势。临床流行病学杂志。2019;110:12-22。https://doi.org/10.1016/j.jclinepi.2019.02.004。

    文章 PubMed 谷歌学术

  5. 5.

    Collins GS、Reitsma JB、Altman DG、Moons KGM。透明报告个体预后或诊断的多变量预测模型 (TRIPOD):TRIPOD 声明。BMC医学。2015;13(1):1。https://doi.org/10.1186/s12916-014-0241-z。

    文章 PubMed PubMed Central Google Scholar

  6. 6.

    Liebal UW 等人。基于质谱的代谢组学的机器学习应用。代谢物。2020;10(6):243。

    CAS 文章 谷歌学术

  7. 7.

    Rajula HSR 等人。医学中传统统计方法与机器学习的比较:诊断、药物开发和治疗。Medicina(考纳斯)。2020;56(9):455。

    文章谷歌学术

  8. 8.

    拉赫曼 MM 等人。人工智能和机器学习方法应对冠状病毒 (COVID-19) 大流行的综合研究。诠释 J 健康服务。2021;51(4):446-61。

    文章谷歌学术

  9. 9.

    哈拉斯 G 等人。COVID-19 肺炎死亡率预测的机器学习方法:皮亚琴察评分的开发和评估。J Med Internet Res。2021;23(5):e29058。

    文章谷歌学术

  10. 10.

    Quiroz-Juárez MA 等人。使用机器学习识别高危 COVID-19 患者。公共科学图书馆一。2021;16(9):e0257234。

    文章谷歌学术

下载参考资料

隶属关系

  1. 荷兰鹿特丹伊拉斯姆斯大学医学中心重症监护室

    JM Smit、ME van Genderen、DAMPJ Gommers 和 J. Van Bommel

  2. EEMCS,模式识别和生物信息学组,代尔夫特理工大学,荷兰代尔夫特

    JM Smit、MJT Reinders 和 JH Krijthe

作者
  1. JM Smit查看作者的出版物

    您也可以在PubMed  Google Scholar中搜索该作者

  2. ME van Genderen查看作者的出版物

    您也可以在PubMed  Google Scholar中搜索该作者

  3. MJT Reinders查看作者的出版物

    您也可以在PubMed  Google Scholar中搜索该作者

  4. DAMPJ Gommers查看作者的出版物

    您也可以在PubMed  Google Scholar中搜索该作者

  5. JH Krijthe查看作者出版物

    您也可以在PubMed  Google Scholar中搜索该作者

  6. J. Van Bommel查看作者的出版物

    您也可以在PubMed  Google Scholar中搜索该作者

贡献

JS 起草了手稿。MvG、JK、MR 和 JvB 严格审查了手稿。所有作者阅读并认可的终稿。

通讯作者

与 JM Smit 或 Mehdi Mirsaeidi 的通信。

同意发表

不适用。

利益争夺

作者声明他们没有相互竞争的利益。

出版商注

Springer Nature 对出版地图和机构附属机构的管辖权主张保持中立。

开放存取本文根据知识共享署名 4.0 国际许可进行许可,该许可允许以任何媒介或格式使用、共享、改编、分发和复制,只要您对原作者和来源给予适当的信任,并提供链接到知识共享许可,并说明是否进行了更改。本文中的图像或其他第三方材料包含在文章的知识共享许可中,除非在材料的信用额度中另有说明。如果材料未包含在文章的知识共享许可中,并且您的预期用途不受法律法规的允许或超出允许的用途,您将需要直接从版权所有者那里获得许可。要查看此许可证的副本,请访问 http://creativecommons.org/licenses/by/4.0/。

转载和许可

通过 CrossMark 验证货币和真实性

引用这篇文章

Smit, JM, van Genderen, ME, Reinders, MJT等。揭秘用于死亡率预测的机器学习。重症监护 25, 447 (2021)。https://doi.org/10.1186/s13054-021-03868-z

下载引文

  • 收到

  • 接受

  • 发表

  • DOI https ://doi.org/10.1186/s13054-021-03868-z

分享这篇文章

与您共享以下链接的任何人都可以阅读此内容:

抱歉,本文目前没有可共享的链接。

由 Springer Nature SharedIt 内容共享计划提供

更新日期:2021-12-24
down
wechat
bug