当前位置: X-MOL 学术JAMA Cardiol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Performance of a Convolutional Neural Network and Explainability Technique for 12-Lead Electrocardiogram Interpretation
JAMA Cardiology ( IF 14.8 ) Pub Date : 2021-11-01 , DOI: 10.1001/jamacardio.2021.2746
J Weston Hughes 1 , Jeffrey E Olgin 2, 3 , Robert Avram 2, 3 , Sean A Abreau 2, 3 , Taylor Sittler 4 , Kaahan Radia 1 , Henry Hsia 2 , Tomos Walters 2 , Byron Lee 2 , Joseph E Gonzalez 1 , Geoffrey H Tison 1, 2, 3, 5
Affiliation  

Importance Millions of clinicians rely daily on automated preliminary electrocardiogram (ECG) interpretation. Critical comparisons of machine learning–based automated analysis against clinically accepted standards of care are lacking.

Objective To use readily available 12-lead ECG data to train and apply an explainability technique to a convolutional neural network (CNN) that achieves high performance against clinical standards of care.

Design, Setting, and Participants This cross-sectional study was conducted using data from January 1, 2003, to December 31, 2018. Data were obtained in a commonly available 12-lead ECG format from a single-center tertiary care institution. All patients aged 18 years or older who received ECGs at the University of California, San Francisco, were included, yielding a total of 365 009 patients. Data were analyzed from January 1, 2019, to March 2, 2021.

Exposures A CNN was trained to predict the presence of 38 diagnostic classes in 5 categories from 12-lead ECG data. A CNN explainability technique called LIME (Linear Interpretable Model-Agnostic Explanations) was used to visualize ECG segments contributing to CNN diagnoses.

Main Outcomes and Measures Area under the receiver operating characteristic curve (AUC), sensitivity, and specificity were calculated for the CNN in the holdout test data set against cardiologist clinical diagnoses. For a second validation, 3 electrophysiologists provided consensus committee diagnoses against which the CNN, cardiologist clinical diagnosis, and MUSE (GE Healthcare) automated analysis performance was compared using the F1 score; AUC, sensitivity, and specificity were also calculated for the CNN against the consensus committee.

Results A total of 992 748 ECGs from 365 009 adult patients (mean [SD] age, 56.2 [17.6] years; 183 600 women [50.3%]; and 175 277 White patients [48.0%]) were included in the analysis. In 91 440 test data set ECGs, the CNN demonstrated an AUC of at least 0.960 for 32 of 38 classes (84.2%). Against the consensus committee diagnoses, the CNN had higher frequency-weighted mean F1 scores than both cardiologists and MUSE in all 5 categories (CNN frequency-weighted F1 score for rhythm, 0.812; conduction, 0.729; chamber diagnosis, 0.598; infarct, 0.674; and other diagnosis, 0.875). For 32 of 38 classes (84.2%), the CNN had AUCs of at least 0.910 and demonstrated comparable F1 scores and higher sensitivity than cardiologists, except for atrial fibrillation (CNN F1 score, 0.847 vs cardiologist F1 score, 0.881), junctional rhythm (0.526 vs 0.727), premature ventricular complex (0.786 vs 0.800), and Wolff-Parkinson-White (0.800 vs 0.842). Compared with MUSE, the CNN had higher F1 scores for all classes except supraventricular tachycardia (CNN F1 score, 0.696 vs MUSE F1 score, 0.714). The LIME technique highlighted physiologically relevant ECG segments.

Conclusions and Relevance The results of this cross-sectional study suggest that readily available ECG data can be used to train a CNN algorithm to achieve comparable performance to clinical cardiologists and exceed the performance of MUSE automated analysis for most diagnoses, with some exceptions. The LIME explainability technique applied to CNNs highlights physiologically relevant ECG segments that contribute to the CNN’s diagnoses.



中文翻译:

卷积神经网络的性能和 12 导联心电图解读的可解释性技术

重要性 数 百万临床医生每天依赖自动初步心电图 (ECG) 解读。缺乏基于机器学习的自动分析与临床公认的护理标准的关键比较。

目的 使用现成的 12 导联心电图数据来训练卷积神经网络 (CNN) 并将其应用到可解释性技术,从而根据临床护理标准实现高性能。

设计、设置和参与者 这项横断面研究使用 2003 年 1 月 1 日至 2018 年 12 月 31 日的数据进行。数据以常用的 12 导联心电图格式从单中心三级医疗机构获得。所有在旧金山加利福尼亚大学接受心电图检查的 18 岁或以上患者均被纳入其中,总共 365 009 名患者。数据分析时间为2019年1月1日至2021年3月2日。

暴露 CNN 经过训练,可根据 12 导联 ECG 数据预测 5 个类别中 38 个诊断类别的存在。一种称为 LIME(线性可解释模型不可知解释)的 CNN 可解释性技术用于可视化有助于 CNN 诊断的心电图片段。

主要结果和措施 在针对心脏病专家临床​​诊断的保留测试数据集中,计算了 CNN 的受试者工作特征曲线 (AUC) 下面积、敏感性和特异性。对于第二次验证,3 名电生理学家提供了共识委员会诊断,并使用 F1 评分对 CNN、心脏病专家临床​​诊断和 MUSE (GE Healthcare) 自动分析性能进行了比较;还针对共识委员会计算了 CNN 的 AUC、敏感性和特异性。

结果 分析中纳入了来自 365 009 名成年患者(平均 [SD] 年龄,56.2 [17.6] 岁;183 600 名女性 [50.3%];以及 175 277 名白人患者 [48.0%])的总共 992 748 份心电图。在 91 440 个测试数据集 ECG 中,CNN 证明 38 个类别中的 32 个类别 (84.2%) 的 AUC 至少为 0.960。与共识委员会的诊断相比,CNN 在所有 5 个类别中的频率加权平均 F1 得分均高于心脏病专家和 MUSE(心律的 CNN 频率加权 F1 得分,0.812;传导,0.729;腔室诊断,0.598;梗死,0.674;心律失常,0.674)。和其他诊断,0.875)。对于 38 个类别中的 32 个类别 (84.2%),CNN 的 AUC 至少为 0.910,并且表现出与心脏病专家相当的 F1 分数和更高的敏感性,除了心房颤动(CNN F1 分数,0.847 与心脏病专家 F1 分数,0.881)、交界心律( 0.526 vs 0.727)、室性早搏(0.786 vs 0.800)和 Wolff-Parkinson-White(0.800 vs 0.842)。与 MUSE 相比,除室上性心动过速外,CNN 的所有类别的 F1 分数均较高(CNN F1 分数,0.696 vs MUSE F1 分数,0.714)。LIME 技术突出了生理相关的心电图片段。

结论和相关性 这项横断面研究的结果表明,现成的心电图数据可用于训练 CNN 算法,以达到与临床心脏病专家相当的性能,并超过大多数诊断的 MUSE 自动分析的性能,但有一些例外。应用于 CNN 的 LIME 可解释性技术突出了有助于 CNN 诊断的生理相关心电图片段。

更新日期:2021-11-08
down
wechat
bug