Artificial Intelligence and Machine Learning in Cardiology,Circulation

当前位置： X-MOL 学术 › Circulation › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Artificial Intelligence and Machine Learning in Cardiology
Circulation ( IF 37.8 ) Pub Date : 2024-04-15 , DOI: 10.1161/circulationaha.123.065469
Rahul C. Deo ₁

Affiliation

Seven years ago, I wrote a review in this journal on the state of machine learning in medicine.¹ My tenet then was that although numerous medical applications could benefit from machine learning, and the requisites for such models—data and algorithms—were widely present, few examples had made their way into practice. I struggled to find illustrative examples from cardiovascular research, let alone commercial products. Since that time, much has changed. Artificial intelligence (AI), a broader term that includes machine learning as a subdiscipline, dominates research publications and the news. And with innovation comes uncertainty, such as concerns over which professions (physicians included) will be made redundant by these innovations. It is thus timely to revisit this topic, offering some perspective on the dizzying pace of innovation.

Multilayered neural networks, the algorithm of choice for most current AI applications, have been around for decade. However, it took widespread availability of fast processors (graphical processing units), publicly available digital data, and an open community of programmers and computer scientists to realize their current potential, which, to date, includes self-driving cars, machine-written screenplays, and music compositions of any genre or topic. Although humbling to accept, currently implemented medical applications are often far simpler uses of the same algorithms. Training such models has now become largely a matter of having adequate data and computational resources. Clinical inputs—whether text, images/videos, or time-dependent signals—all have some representation in nonmedical fields, so most pioneering work has typically occurred outside medicine. But because of this foundation, training medical models has become a commodity, and one can even contribute labeled training data to a commercial service, which will train and host a model for you. So, the fact that neural networks can achieve X or Y for some medical applications should no longer come as a surprise. The challenge is producing a model that (1) has value and (2) will be used.

In its loftiest form, AI evokes a system entirely free from human intervention and oversight that can continuously sense, interpret, and act (Figure [A]). Perhaps even more critical, although often forgotten, such a loop can be coupled with a continuous learning process to improve the path from sensory inputs to optimal action with each iteration. The premise here is that the best action in most cases may not be known from the onset but can be learned through systematized exploration. Such a learning process can be “online,” where actions improve over time in the context of a single repeated interaction (such as a physician developing more effective treatment strategies with a particular patient over time), or “offline,” where the review of systematically collected stimulus-response data helps shape an increasingly effective and precise collection of solutions. Nonetheless, this structure of rapid iterative learning across hundreds of thousands or millions of cycles (termed reinforcement learning) has been the foundation of many of the most startling advances in AI in the past decade.

Figure. Models of learning systems. A, An ideal learning system combining a sensing, interpreting, and acting loop with ongoing learning. This is the classic reinforcement learning paradigm. B, A closed-loop system combining sensing, interpreting, and acting. C, Decision support systems where machine sensing and interpretation are presented as a suggestion to a provider, who integrates other information before acting. The typical models used here are trained by supervised learning and frozen for use. CGM indicates continuous glucose monitoring.

Such machine-driven closed-loop systems are rarely found in medicine. In diabetes management, combining glucose sensors with insulin pumps recreates the inner loop (Figure [B]). But no learning takes place in real time, and more individualized perturbation-response patterns are not factored into developing solutions. Machine learning in medicine invariably occurs “offline”—removed from actual care—and solutions tend to be locked so that they can be submitted for regulatory approval. There are obvious safety reasons for this approach. Still, we should not be surprised if the rate of acquiring new insights continues to lag heavily behind other fields.

Nonetheless, there is no shortage of problems in medicine that can be solved with more immediately available solutions (and under the assumption that we already know the optimal action). In the spirit of catalyzing change, I have attempted to categorize these in terms of the type of problem solved and how value is added, rather than using traditional categories of machine learning, such as reinforcement, supervised, and unsupervised learning. (The below examples are typically trained via supervised learning, by providing training data where the correct answer is known.) Turning to the most common uses of AI in medicine, one well-defined category is training models that help humans interpret diagnostic data.² Revisiting our loop, the machine senses and interprets diagnostic data, but “acting” is limited to surfacing suggestions to the clinician, who is then free to verify it and act or ignore it altogether (Figure [C]) . There are hundreds of Food and Drug Administration–approved models of this type, including many in the cardiovascular space. Humans remain the ultimate arbiter in the typical intended use—and in most applications, they are expected to be able to perform the same task with the same inputs—but the machine may be quicker to an initial answer (such as is seen in triaging applications like acute stroke interpretation), or they may be more consistent, or simply a second (less weary) pair of eyes. Introducing a human gate reduces some safety risks but also limits the upside because the skilled person involved in every decision remains a bottleneck and driver of costs. And because the basis for the observed action is often impenetrable, learning retrospectively from such a system can be challenging no matter how many data accumulate.

A second category of AI applications, although far less common, involves enabling data acquisition by individuals not typically trained to do so. Although tomographic imaging tends to be standardized, ultrasound data acquisition is notoriously user-dependent. A compelling example in cardiology is AI-guided assistance in echocardiogram video acquisition,³ of which several commercial applications are available.

A third class of models estimates disease risk from low-cost, readily acquired signals, including from consumer wearable devices. If well-calibrated, such models could improve population health by catching individuals earlier than otherwise. Examples include detecting atrial fibrillation or elevated glycated hemoglobin from smartwatches. Of course, the usual concerns of low pretest probability remain and may be further exacerbated by nonuniform data collection, a significant problem in an unsupervised home setting with opportunistic data collection. Thus, published models have been abundant, but few have found traction in regular use.

A fourth area of focus has been taking the repetitive and perhaps tedious aspects of medical care and moving them to machines. In cardiology, ECG interpretation has been one example where machines have reported automated measurements for decades, and rule-based systems provide an initial interpretation of many common arrhythmias. With the ease of training neural network models, that approach has expanded towards other common measurements, such as echocardiograms, magnetic resonance, and computed tomography. Likewise, the advent of large language models has led to a search for text-based conversation in medicine that could be moved to a machine. Examples include patient dialogue by email or short message service—which may be triaged, and an initial draft response created by a machine, which a licensed provider could verify.

There has been no shortage of models trained to perform prediction or classification tasks—and as stated earlier, the fact that such models are feasible should no longer be surprising. Whether such models will be accepted and helpful in a clinical workflow remains uncertain. It has thus been encouraging to see a small but increasing number of trials where machine learning models are deployed in clinical practice often against a comparison arm where no such model is available. Early examples include prediction of myocardial infarction, atrial fibrillation,⁴ reduced ejection fraction, and ejection fraction estimation from echocardiogram videos. Although it is too soon to derive any general conclusions, one interesting aspect of these trials is that, because they reflect the “suggestion” model (Figure [C]) , the generalizability of these results to other settings depends both on patients and potentially also on providers. As with any innovation, an economic case must also be made for each model to justify the commercial path for a product.

The above examples all assume that humans can work alongside machines with fundamentally different logical processes, which may not be readily explainable. Other than in simple cases, such as locating a spiculated nodule on a mammogram or chest x-ray film, it is unrealistic to expect that neural network models with millions or billions of parameters capturing multilevel interactions across time and space will distill their output into a simple representation for review. People sometimes point to the saliency maps featured in most publications as a counterexample, but as has been noted,⁵ such images are more “academic storytelling” than something we can expect could help in individual patient care. Accepting that explainability is implausible, we will need to decide whether we can incorporate the output of a black box in our decision-making, especially in high-risk situations. Moreover, the inscrutability of the basis of model outputs comes with the risk of poorer performance in some groups differing from the training set—which, for reasons of where such models are historically trained, may represent underserved communities. Regulatory agencies are concerned by all these issues—and I expect a larger and larger burden placed on model manufacturers to alleviate problems of risk and bias.

The most exciting thing about the past few years is a transition from academic research for its own sake to clinical trials, investment, and products (not always in that order). Adoption will follow pain in the current system, which is invariably driven by economics. Always impatient, I hope we go one step further and find ways to move from the “suggestion model” to a learning system that not only mimics us, but can surpass us, addressing critical knowledge gaps and identifying and helping implement optimal context-dependent solutions.

None.

Disclosures R.C.D. is a cofounder of Atman Health, a company focused on software-driven acceleration of guideline-directed medical therapy in cardiovascular, kidney, and metabolic disease.

The American Heart Association celebrates its 100th anniversary in 2024. This article is part of a series across the entire AHA Journal portfolio written by international thought leaders on the past, present, and future of cardiovascular and cerebrovascular research and care. To explore the full Centennial Collection, visit https://www.ahajournals.org/centennial

The opinions expressed in this article are not necessarily those of the editors or of the American Heart Association.

For Sources of Funding and Disclosures, see page 1237.

Circulation is available at www.ahajournals.org/journal/circ

中文翻译：

心脏病学中的人工智能和机器学习

七年前，我在本杂志上写了一篇关于医学机器学习现状的评论。¹当时我的宗旨是，尽管许多医疗应用可以从机器学习中受益，并且此类模型的必备条件（数据和算法）广泛存在，但很少有例子进入实践。我很难从心血管研究中找到说明性的例子，更不用说商业产品了。从那时起，情况发生了很大变化。人工智能（AI）是一个更广泛的术语，包括机器学习作为一个分支学科，主导着研究出版物和新闻。创新带来了不确定性，例如担心哪些职业（包括医生）将因这些创新而变得多余。因此，现在是时候重新审视这个话题，为令人眼花缭乱的创新步伐提供一些视角。

多层神经网络是当前大多数人工智能应用的首选算法，已经存在了十年。然而，需要广泛使用的快速处理器（图形处理单元）、公开的数字数据以及程序员和计算机科学家的开放社区才能实现其当前的潜力，迄今为止，其中包括自动驾驶汽车、机器编写的剧本，以及任何流派或主题的音乐作品。虽然接受起来有些谦虚，但目前实现的医疗应用程序通常是相同算法的简单得多的使用。现在，训练此类模型在很大程度上已成为拥有足够数据和计算资源的问题。临床输入——无论是文本、图像/视频还是时间相关信号——在非医学领域都有一定的代表性，因此大多数开创性工作通常发生在医学之外。但由于有了这个基础，训练医疗模型已经成为一种商品，甚至可以将标记的训练数据贡献给商业服务，该服务将为您训练和托管模型。因此，神经网络可以在某些医疗应用中实现 X 或 Y 的事实不应再令人惊讶。面临的挑战是生成一个模型，该模型 (1) 有价值并且 (2) 将被使用。

在其最高形式中，人工智能唤起了一个完全不受人类干预和监督的系统，可以持续感知、解释和行动（图[A]）。也许更重要的是，尽管经常被遗忘，这样的循环可以与持续学习过程相结合，以改进每次迭代从感官输入到最佳行动的路径。这里的前提是，在大多数情况下，最佳行动可能无法从一开始就知道，但可以通过系统化的探索来学习。这样的学习过程可以是“在线”的，其中行动在单次重复互动的情况下随着时间的推移而改善（例如，随着时间的推移，医生对特定患者制定更有效的治疗策略），也可以是“离线”的，其中审查系统地收集刺激-反应数据有助于形成日益有效和精确的解决方案集合。尽管如此，这种跨越数十万或数百万个周期的快速迭代学习结构（称为强化学习）已经成为过去十年人工智能领域许多最惊人进步的基础。

数字。 学习系统模型。答：一个理想的学习系统，将感知、解释和行动循环与持续学习相结合。这是经典的强化学习范例。B，结合感知、解释和行动的闭环系统。C，决策支持系统，其中机器传感和解释作为建议提供给提供商，提供商在采取行动之前集成其他信息。这里使用的典型模型通过监督学习进行训练并冻结以供使用。 CGM 表示连续血糖监测。

这种机器驱动的闭环系统在医学中很少见。在糖尿病管理中，将葡萄糖传感器与胰岛素泵相结合可重建内循环（图 [B]）。但学习不会实时发生，更个性化的扰动响应模式也不会被纳入开发解决方案的考虑范围。医学中的机器学习总是“离线”发生——脱离实际护理——并且解决方案往往被锁定，以便可以提交监管部门批准。这种方法有明显的安全原因。尽管如此，如果获得新见解的速度继续严重落后于其他领域，我们也不应该感到惊讶。

尽管如此，医学上并不缺乏可以通过更立即可用的解决方案来解决的问题（并且假设我们已经知道最佳行动）。本着促进变革的精神，我尝试根据解决的问题类型和价值增加方式对这些进行分类，而不是使用传统的机器学习类别，例如强化学习、监督学习和无监督学习。（下面的示例通常通过监督学习进行训练，通过提供已知正确答案的训练数据。）谈到人工智能在医学中最常见的用途，一个明确定义的类别是帮助人类解释诊断数据的训练模型。²重新审视我们的循环，机器感知并解释诊断数据，但“行动”仅限于向临床医生提出建议，然后临床医生可以自由地验证它并采取行动或完全忽略它（图 [C]）。美国食品和药物管理局批准的此类模型有数百个，其中包括心血管领域的许多模型。在典型的预期用途中，人类仍然是最终的仲裁者，并且在大多数应用中，他们预计能够使用相同的输入执行相同的任务，但机器可能会更快地给出初始答案（例如在分类应用中看到的那样）像急性中风解释），或者它们可能更加一致，或者只是第二双（不那么疲倦的）眼睛。引入人工门可以降低一些安全风险，但也限制了好处，因为参与每个决策的技术人员仍然是成本的瓶颈和驱动因素。而且由于观察到的行为的基础通常是难以理解的，因此无论积累了多少数据，从这样的系统中进行回顾性学习都可能具有挑战性。

第二类人工智能应用虽然不太常见，但涉及允许通常未经培训的个人获取数据。尽管断层扫描成像趋于标准化，但众所周知，超声数据采集依赖于用户。心脏病学领域的一个引人注目的例子是人工智能引导的超声心动图视频采集辅助，其中^{3有多种商业应用可供使用。}

第三类模型通过低成本、易于获取的信号（包括来自消费者可穿戴设备的信号）来估计疾病风险。如果经过良好校准，此类模型可以通过比其他方式更早地发现个体来改善人口健康。例如，通过智能手表检测心房颤动或糖化血红蛋白升高。当然，对预测试概率低的常见担忧仍然存在，并且可能会因数据收集不均匀而进一步加剧，这在机会主义数据收集的无监督家庭环境中是一个重大问题。因此，已发布的模型非常丰富，但很少有模型能在常规使用中获得牵引力。

第四个重点领域是将医疗保健中重复且可能乏味的部分转移到机器上。在心脏病学中，心电图解释是机器数十年来报告自动测量的一个例子，基于规则的系统提供了对许多常见心律失常的初步解释。由于训练神经网络模型变得容易，该方法已扩展到其他常见测量，例如超声心动图、磁共振和计算机断层扫描。同样，大型语言模型的出现导致人们开始寻找可以转移到机器上的基于文本的医学对话。例子包括通过电子邮件或短信服务进行的患者对话（可以进行分类），以及由机器创建的初始响应草稿（授权提供商可以验证）。

不乏经过训练来执行预测或分类任务的模型，并且如前所述，此类模型的可行性这一事实不应再令人惊讶。这些模型是否会被接受并在临床工作流程中发挥作用仍不确定。因此，令人鼓舞的是，看到少量但越来越多的试验在临床实践中部署机器学习模型，通常与没有此类模型的比较组进行比较。早期的例子包括预测心肌梗塞、心房颤动、射血分数降低⁴以及根据超声心动图视频估计射血分数。尽管现在得出任何一般性结论还为时过早，但这些试验的一个有趣的方面是，因为它们反映了“建议”模型（图[C]），所以这些结果对其他环境的普遍性既取决于患者，也可能取决于患者。关于提供商。与任何创新一样，还必须为每个模型提供经济案例，以证明产品商业路径的合理性。

上述例子都假设人类可以与具有根本不同逻辑过程的机器一起工作，这可能不容易解释。除了简单的情况之外，例如在乳房 X 光照片或胸部 X 光片上定位针状结节，期望具有数百万或数十亿个参数的神经网络模型捕获跨时间和空间的多级交互将其输出提炼为简单的表示以供审查。人们有时会以大多数出版物中的显着图作为反例，但正如已经指出的那样，⁵此类图像更像是“学术故事讲述”，而不是我们预期可以帮助个体患者护理的东西。承认可解释性是难以置信的，我们需要决定是否可以将黑匣子的输出纳入我们的决策中，尤其是在高风险情况下。此外，模型输出基础的神秘性伴随着与训练集不同的某些群体表现较差的风险——由于此类模型历史上接受训练的原因，这些群体可能代表服务不足的社区。监管机构对所有这些问题感到担忧，我预计模型制造商会承受越来越大的负担，以减轻风险和偏见问题。

过去几年最令人兴奋的事情是从学术研究本身转向临床试验、投资和产品（并不总是按这个顺序）。现行系统的痛苦总是由经济驱动的，而采用将会带来痛苦。总是不耐烦，我希望我们能更进一步，找到从“建议模型”转向学习系统的方法，该系统不仅可以模仿我们，而且可以超越我们，解决关键的知识差距，识别并帮助实施最佳的依赖于上下文的解决方案。

没有任何。

RCD是 Atman Health 的联合创始人，该公司专注于通过软件驱动加速心血管、肾脏和代谢疾病的指南指导药物治疗。

美国心脏协会将于 2024 年庆祝其成立 100 周年。本文是国际思想领袖撰写的整个 AHA 期刊系列文章的一部分，内容涉及心脑血管研究和护理的过去、现在和未来。要探索完整的百年纪念收藏，请访问 https://www.ahajournals.org/centennial

本文表达的观点不一定代表编辑或美国心脏协会的观点。

有关资金来源和披露信息，请参阅第 1237 页。

流通量可在 www.ahajournals.org/journal/circ 上获取

更新日期：2024-04-17

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南