Development and validation of parsimonious algorithms to classify acute respiratory distress syndrome phenotypes: a secondary analysis of randomised controlled trials.,The Lancet Respiratory Medicine

当前位置： X-MOL 学术 › Lancet Respir. Med. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Development and validation of parsimonious algorithms to classify acute respiratory distress syndrome phenotypes: a secondary analysis of randomised controlled trials.
The Lancet Respiratory Medicine ( IF 38.7 ) Pub Date : 2020-01-13 , DOI: 10.1016/s2213-2600(19)30369-8
Pratik Sinha ₁ , Kevin L Delucchi ₂ , Daniel F McAuley ₃ , Cecilia M O'Kane ₄ , Michael A Matthay ₁ , Carolyn S Calfee ₁

Affiliation

Background

Using latent class analysis (LCA) in five randomised controlled trial (RCT) cohorts, two distinct phenotypes of acute respiratory distress syndrome (ARDS) have been identified: hypoinflammatory and hyperinflammatory. The phenotypes are associated with differential outcomes and treatment response. The objective of this study was to develop parsimonious models for phenotype identification that could be accurate and feasible to use in the clinical setting.

Methods

In this retrospective study, three RCT cohorts from the National Lung, Heart, and Blood Institute ARDS Network (ARMA, ALVEOLI, and FACTT) were used as the derivation dataset (n=2022), from which the machine learning and logistic regression classifer models were derived, and a fourth (SAILS; n=715) from the same network was used as the validation test set. LCA-derived phenotypes in all of these cohorts served as the reference standard. Machine-learning algorithms (random forest, bootstrapped aggregating, and least absolute shrinkage and selection operator) were used to select a maximum of six important classifier variables, which were then used to develop nested logistic regression models. Only cases with complete biomarker data in the derivation dataset were used for variable selection. The best logistic regression models based on parsimony and predictive accuracy were then evaluated in the validation test set. Finally, the models' prognostic validity was tested in two external ARDS clinical trial datasets (START and HARP-2) by assessing mortality at days 28, 60, and 90 and ventilator-free days to day 28.

Findings

The six most important classifier variables were interleukin (IL)-8, IL-6, protein C, soluble tumour necrosis factor receptor 1, bicarbonate, and vasopressor use. From the nested models, three-variable (IL-8, bicarbonate, and protein C) and four-variable (3-variable plus vasopressor use) models were adjudicated to be the best performing. In the validation test set, both models showed good accuracy (AUC 0·94 [95% CI 0·92–0·95] for the three-variable model and 0·95 [95% CI 0·93–0·96] for the four-variable model) against LCA classifications. As with LCA-derived phenotypes, the hyperinflammatory phenotype as identified by the classifier model was associated with higher mortality at day 90 (87 [39%] of 223 patients vs 112 [23%] of 492 patients; p<0·0001) and fewer ventilator-free days (median 14 days [IQR 0–22] vs 22 days [0–25]; p<0·0001). In the external validation datasets, three-variable models developed in the derivation dataset identified two phenotypes with distinct clinical features and outcomes consistent with previous findings, including differential survival with simvastatin versus placebo in HARP-2 (p=0·023 for survival at 28 days).

Interpretation

ARDS phenotypes can be accurately identified with parsimonious classifier models using three or four variables. Pending the development of real-time testing for key biomarkers and prospective validation, these models could facilitate identification of ARDS phenotypes to enable their application in clinical trials and practice.

Funding

National Institutes of Health.

中文翻译：

开发和验证用于分类急性呼吸窘迫综合征表型的简约算法：随机对照试验的二次分析。

背景

在五个随机对照试验 (RCT) 队列中使用潜在类别分析 (LCA)，已经确定了急性呼吸窘迫综合征 (ARDS) 的两种不同表型：低炎症和高炎症。表型与不同的结果和治疗反应有关。本研究的目的是开发用于表型识别的简约模型，该模型可以准确且可行地用于临床环境。

方法

在这项回顾性研究中，来自国家肺、心脏和血液研究所 ARDS 网络（ARMA、ALVEOLI 和 FACTT）的三个 RCT 队列被用作推导数据集（n=2022），机器学习和逻辑回归分类器模型从中推导出来，并使用来自同一网络的第四个（SAILS；n = 715）作为验证测试集。所有这些队列中的 LCA 衍生表型均用作参考标准。机器学习算法（随机森林、自举聚合和最小绝对收缩和选择算子）用于选择最多六个重要的分类器变量，然后用于开发嵌套逻辑回归模型。只有推导数据集中具有完整生物标志物数据的案例才用于变量选择。然后在验证测试集中评估基于简约性和预测准确性的最佳逻辑回归模型。最后，通过评估第 28、60 和 90 天以及第 28 天无呼吸机的死亡率，在两个外部 ARDS 临床试验数据集（START 和 HARP-2）中测试了模型的预后有效性。

发现

六个最重要的分类变量是白细胞介素 (IL)-8、IL-6、蛋白 C、可溶性肿瘤坏死因子受体 1、碳酸氢盐和血管加压药的使用。在嵌套模型中，三变量（IL-8、碳酸氢盐和蛋白 C）和四变量（三变量加血管加压剂使用）模型被判定为表现最佳。在验证测试集中，两个模型都显示出良好的准确性（三变量模型的 AUC 0·94 [95% CI 0·92–0·95] 和 0·95 [95% CI 0·93–0·96]对于四变量模型）针对 LCA 分类。与 LCA 衍生的表型一样，分类模型确定的高炎症表型与第 90 天较高的死亡率相关（223 名患者中的 87 [39%]对492 名患者中的 112 [23%]；p<0·0001）和更少的无呼吸机天数（中位数 14 天 [IQR 0–22]vs 22 天 [0-25]；p<0·0001)。在外部验证数据集中，在推导数据集中开发的三变量模型确定了两种具有不同临床特征和结果的表型与先前的发现一致，包括在 HARP-2 中使用辛伐他汀与安慰剂的不同存活率（p=0·023，28 岁时的存活率）天）。