当前位置: X-MOL 学术Circ. Cardiovasc. Qual. Outcomes › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
American Heart Association Precision Medicine Platform Addresses Challenges in Data Sharing
Circulation: Cardiovascular Quality and Outcomes ( IF 6.2 ) Pub Date : 2021-09-14 , DOI: 10.1161/circoutcomes.121.007949
Laura M. Stevens 1, 2 , James A. de Lemos 3 , Sandeep R. Das 3, 4 , Christine Rutan 2 , Heather M. Alger 2 , Mitchell S.V. Elkind 5 , Juan Zhao 6 , Kritika Iyer 7 , C. Alberto Figueroa 7, 8 , Jennifer L. Hall 2, 9
Affiliation  

The cloud-based American Heart Association (AHA) Precision Medicine Platform (PMP; https://precision.heart.org/)1 was designed to address and overcome major challenges faced by researchers. The first challenge to overcome was sharing data. We have tested several data sharing options with researchers in the past 2 years. When the coronavirus disease 2019 (COVID-19) pandemic hit, we were prepared to launch a process that researchers supported. In short, we opened up our own COVID-19 registry data powered by Get With the Guidelines (GWTG). Increasing access to data for all researchers versus keeping it walled off to a select few had the ability to improve the quality, reproducibility, and validity of scientific findings during a time when the scientific process suffered a major setback. Through our initial tests, we also learned that researchers are not willing to invest time/effort in an access and reuse process that was complicated. Thus, we eliminated several steps and provided researchers a ready-to-run, cloud-based, virtual workspace that included (1) the necessary data documentation and data files, (2) statistical and visualization software, as well as machine learning and deep learning analysis tools, and (3) the computational power necessary to perform the analyses.1–7


The PMP is unique among cloud-based academic platforms in that researchers may access data along with comprehensive data documentation from multiple sources including real-world patient data, longitudinal epidemiological studies, electronic health record data, and more. The long-standing reputation of the AHA and the trust it has earned in the community allow our organization to serve as a neutral broker for housing many rich data sources. Through our testing with researchers through the years, we have learned that a critical factor in sharing data has been allowing data owners to provision access. We do this through a process called Data Use Operating System. The open-source code (to which we made slight modifications) can be found on GitHub. This process involves researchers requesting access to answer a few short questions that are then emailed directly to the Data Access Committee assigned by the owner of the data. This committee votes on approval/revision/no approval to the data. The data requester is informed by email of the final decision, and if approved, the data and documentation are deposited into their workspace. For some data sets, like the COVID-19 registry data, the Data Access Committee requests and reviews a manuscript proposal with a statistical analysis plan. A key learning from this process for the AHA and researchers who thought their data might be published with errors or felt that all their hard work would now simply be taken by others, was that in fact, researchers requesting access wanted to collaborate and learn from the data owners. In many cases, the quality, validation, and replication of the research has improved.


Finally, the cloud-based AHA PMP allows all researchers at any University (with or without large data sets or resources) equal opportunities. Along these same lines, the AHA collaborates with cloud providers to allow academic researchers to use the secure workspaces at a reduced cost.


The 3 major challenges from our experience that researchers face with data sharing include (1) the data governance process including data use agreements, (2) access to critical standardized information accompanying the data sets including data dictionaries and case report forms, and (3) lack of flexibility in many cloud-based environments to scale resources to meet performance needs for analyses of shared data including images and electronic health record data.


In 2020, the global COVID-19 pandemic provided a valuable opportunity to overcome the challenges of data governance and access to standardized information including accompanying data dictionaries and case report forms. Clinicians had a need for generalizable real-world data to inform their understanding of COVID-19. Since no preexisting data sources or workflows were available, the AHA launched the COVID-19 CVD registry powered by GWTG and opted to make these data available on the AHA PMP.1 This voluntary registry (described previously)1 was designed to fill the unique gap in understanding cardiovascular risk and outcomes in patients with COVID-19 and is open to all hospitals and health systems in the United States treating adult patients with acute COVID-19 infections.


In the past, access to AHA GWTG data sets involved coordination across a variety of stakeholders. Although this process resulted in >600 publications over the last 17 years, the COVID-19 pandemic necessitated more rapid collaboration, evaluation, and publication of findings to expedite the pace of science.


The Figure highlights the key implementation phases of this initiative. The initial phase of the initiative was open to all 104 hospitals that were actively enrolling records for the COVID-19 CVD registry, as well as the steering committee members. During this initial phase, we opened PMP workspaces for researchers with approved manuscript proposals to complete their analyses. A data use agreement was necessary before end users receiving a secure workspace equipped with aggregate data containing records from all registry sites, data documentation, and analysis tools. The data use agreement was simplified to enhance efficiency. In short, (1) we implemented a no-redline policy for end users, (2) added project data fields allowing for more specified data files to be applied to individual workspaces (based on modules and year), (3) required a signature from only the investigator instead of the institution, and (4) removed unnecessary language involving data use and disclosures that are now under the control of the AHA and permissible since the data are accessed on a secure cloud-based workspace. All changes are Health Insurance Portability and Accountability Act of 1996 compliant and improved turnaround time and overall completion of data use agreements. We are moving toward a data use agreement that will be moved to the PMP, accessible online, and able to be executed upon receipt.


Figure. Design of open data initiative on the American Heart Association (AHA) Precision Medicine Platform (PMP). Researchers submitted manuscript proposals, which were reviewed by a Research and Publications Committee. Once approved, researchers analyzed the data in a secure workspace on the PMP and published the manuscript. COVID-19 indicates coronavirus disease 2019; GWTG, Get With The Guidelines; and NDA, non-disclosure agreement.


Several approaches were used to support new investigators on the PMP. A member of the COVID-19 research and publications committee was assigned as a liaison to each project, to help with analysis planning and data questions. Weekly office hours were held virtually to listen to the needs and suggestions from researchers. Together, these approaches improved communication, addressed many questions, and significantly accelerated the process. As of April 20, 2021, 40 proposals have been accepted for investigator-led analyses, 15 analyses have been drafted for submission, and 9 manuscripts have been published.1,3,6,8-13


We were committed to the FAIR (Findable, Accessible, Interoperable, and Reusable) guiding principles that include uniform definitions, data dictionaries, and data documentation. This interactive data documentation for our COVID-19 GWTG data (https://precision.heart.org/documentation/AHA-COVID19-CVD-GWTG/index.html) improved the understanding of data definitions and derived variables, thereby reducing inconsistencies across manuscripts and frustrations in dealing with open data. In particular, the explore and discover section of the interactive data documentation in the manuscript illustrates how researchers are able to access all data documentation files, missingness of data, data distribution, and more. The data documentation is not multiple flat PDF files that require users to toggle between files without the clear understanding of what the variables mean. We worked in coordination with the AHA COVID-19 steering committee, made up of clinical, statistical, and epidemiological experts, to arrive at data standards that were based on previous GWTG data standards, as well as data from European registries. To further address this challenge, we piloted the use of usage examples and tutorials written by our AHA data science team in multiple languages that allowed users to reuse shared code in their workspace resulting in final products like a demographic profile of the data. Thus, all researchers using the same data set in their workspace and the code that was verified and approved would end up with the same demographic profile for their manuscripts. This improved consistency across manuscripts. Finally, members of the COVID Research and Publications Committee and AHA leadership reviewed manuscripts before submission to provide quality oversight and conformity with the original proposal.


Solutions for data governance and data documentation on the PMP were also tested by other nonprofit groups, including the American Society of Clinical Oncology, which licenses the underlying technology of the PMP to deliver CancerLinQ Registry data to academic users for analysis. The AHA also works with the Society of Critical Care Medicine to map variables between our registries with the end goal of increasing our understanding of COVID-19 and its impact on patient lives. Both opportunities provided additional learnings and solutions from end users with respect to data use agreements and data documentation.


To overcome challenges in scaling resources for performance needs, we worked closely with researchers training neural networks for medical image segmentation and large-scale simulations. In the cardiovascular field, diagnostic decisions can be improved using algorithms to segment coronary vessels in angiograms. The scalability, larger memory, and computing power of the PMP paired researchers with 4 NVIDIA Tesla K80 graphics processing unit to train a custom pipeline, AngioNet—a neural network for coronary segmentation.4 By doing so, the research team was able to increase the number of images used to train each iteration of the network, improving accuracy and generalizability compared with training on a single graphics processing unit. Another computationally intensive application deployed to the PMP is CRIMSON,2 an open-source hemodynamic modeling software that has been used in a wide range of applications, from cardiovascular disease research to surgical planning.2 The finite element based flowsolver of this software has been compiled on the PMP in a Docker container, allowing researchers to perform large-scale hemodynamic simulations on patient-specific anatomic models using the high-performance computing resources of the PMP.


Working with researchers performing unsupervised machine learning across electronic health record data also provided innovative solutions to improve agility in workspaces for end users.7 Zhao et al7 used constrained nonnegative tensor factorization to extract phenotypic topics across time scales in a study cohort derived from a deidentified copy of the electronic health record for patients in the Vanderbilt University Medical Center on the PMP. This study identified previous risk factors associated with cardiovascular disease, as well as new potential factors including vitamin D deficiency and depression, as well as urinary infection.7


AHA’s PMP has enabled secure delivery of data through agile workspaces that scale with the high-performance compute needs of researchers and allow flexibility in the ready-to-run analysis tools. By listening and partnering with end users, we have overcome many of the hurdles facing researchers today including outdated data governance policies, insufficient data documentation, and inability of cloud-based environments to scale up resources for performance and allow researchers to personalize their workspaces with their own tools and pipelines.


We would like to thank all of the members of the American Heart Association (AHA) COVID-19 Steering Committee for volunteering their time and expertise to this initiative. The Get With The Guidelines programs are provided by the AHA. The Precision Medicine Platform was established by the AHA, is powered by Amazon Web Services, and is supported by Hitachi Vantara.


This project was supported by the American Heart Association (AHA). AHA’s COVID-19 CVD Registry is partially supported by generous funds from the Gordon and Betty Moore Foundation.


Disclosures L.M. Stevens, Dr Alger, Dr Hall, and C. Rutan are employees of the American Heart Association. Dr Hall is an adjunct professor at the University of Minnesota. Drs de Lemos and Elkind are unpaid officers of the American Heart Association. Dr de Lemos discloses receiving grant support from Abbott Diagnostics and Roche Diagnostics and consulting income from Abbott Diagnostics, Ortho Clinical Diagnostics, Quidel Cardiovascular, Amgen, Regeneron, Eli Lilly and Novo Nordisk, and Janssen. Dr Elkind discloses receiving study drug in-kind from the BMS-Pfizer Alliance for Eliquis and ancillary research funding from Roche for an NIH-funded trial of stroke prevention; receiving royalties from UpToDate for chapters related to stroke; and receiving funding from NINDS, NHLBI, and the Leducq Foundation. Dr Figueroa discloses he is a cofounder of AngioInsight, Inc. The other authors report no conflicts.


The opinions expressed in this article are not necessarily those of the editors or of the American Heart Association.


This manuscript was sent to Dennis T. Ko, MD, Senior Guest Editor, for review by expert referees, editorial decision, and final disposition.


For Sources of Funding and Disclosures, see page 943.




中文翻译:

美国心脏协会精准医学平台应对数据共享方面的挑战

基于云的美国心脏协会 (AHA) 精准医学平台 (PMP; https://precision.heart.org/) 1旨在解决和克服研究人员面临的主要挑战。要克服的第一个挑战是共享数据。在过去的 2 年中,我们与研究人员测试了多种数据共享选项。当 2019 年冠状病毒病 (COVID-19) 大流行时,我们准备启动研究人员支持的流程。简而言之,我们开放了自己的 COVID-19 注册数据,由获取指南 (GWTG) 提供支持。在科学过程遭受重大挫折的时期,增加所有研究人员对数据的访问,而不是将其限制在少数人手中,能够提高科学发现的质量、可重复性和有效性。通过我们的初步测试,我们还了解到研究人员不愿意在复杂的访问和重用过程中投入时间/精力。因此,1–7


PMP 在基于云的学术平台中是独一无二的,因为研究人员可以访问数据以及来自多个来源的综合数据文档,包括真实世界的患者数据、纵向流行病学研究、电子健康记录数据等。AHA 的长期声誉以及它在社区中赢得的信任使我们的组织能够充当容纳许多丰富数据源的中立经纪人。通过多年来与研究人员的测试,我们了解到共享数据的一个关键因素是允许数据所有者提供访问权限。我们通过一个称为数据使用操作系统的过程来做到这一点。可以在 GitHub 上找到开源代码(我们对其进行了轻微修改)。此过程涉及研究人员请求访问权限以回答一些简短问题,然后将这些问题直接通过电子邮件发送给数据所有者指定的数据访问委员会。该委员会对数据的批准/修订/不批准进行投票。数据请求者会通过电子邮件收到最终决定的通知,如果获得批准,数据和文档将存入他们的工作区。对于某些数据集,例如 COVID-19 注册数据,数据访问委员会要求并审查带有统计分析计划的手稿提案。对于 AHA 和那些认为他们的数据可能会发布错误或认为他们所有的辛勤工作现在只会被其他人承担的研究人员来说,从这个过程中学到的一个重要教训是,事实上,要求访问权限的研究人员希望合作并向他们学习数据所有者。在许多情况下,质量,


最后,基于云的 AHA PMP 允许任何大学的所有研究人员(有或没有大数据集或资源)平等机会。按照同样的思路,AHA 与云提供商合作,允许学术研究人员以更低的成本使用安全的工作空间。


根据我们的经验,研究人员在数据共享方面面临的 3 个主要挑战包括 (1) 包括数据使用协议在内的数据治理过程,(2) 访问数据集随附的关键标准化信息,包括数据字典和案例报告表,以及 (3)许多基于云的环境缺乏灵活性来扩展资源以满足分析共享数据(包括图像和电子健康记录数据)的性能需求。


2020 年,全球 COVID-19 大流行为克服数据治理和获取标准化信息(包括随附的数据字典和病例报告表)的挑战提供了宝贵的机会。临床医生需要可概括的真实世界数据来告知他们对 COVID-19 的理解。由于没有预先存在的数据源或工作流程可用,AHA 启动了由 GWTG 提供支持的 COVID-19 CVD 注册,并选择在 AHA PMP 上提供这些数据。1这个自愿登记(之前描述)1旨在填补了解 COVID-19 患者心血管风险和结果的独特空白,并向美国所有治疗急性 COVID-19 感染成年患者的医院和卫生系统开放.


过去,访问 AHA GWTG 数据集涉及各种利益相关者之间的协调。尽管这一过程在过去 17 年中发表了 600 多篇论文,但 COVID-19 大流行需要更快速的合作、评估和发现的发表,以加快科学的步伐。


该图突出显示了该计划的关键实施阶段。该计划的初始阶段对所有 104 家积极为 COVID-19 CVD 登记处登记记录的医院以及指导委员会成员开放。在这个初始阶段,我们为研究人员开放了 PMP 工作区,他们的手稿提案已获批准,以完成他们的分析。在最终用户收到一个安全的工作空间之前,数据使用协议是必要的,该工作空间配备了包含来自所有注册站点、数据文档和分析工具的记录的汇总数据。简化数据使用协议以提高效率。简而言之,(1) 我们为最终用户实施了无红线政策,(2) 添加了项目数据字段,允许将更多指定的数据文件应用于单个工作区(基于模块和年份),(3) 只需要研究人员而不是机构的签名,并且 (4) 删除了涉及数据使用和披露的不必要的语言,这些语言现在由 AHA 控制并且是允许的,因为数据是在基于云的安全工作空间上访问的. 所有更改均符合 1996 年健康保险流通和责任法案,并改进了周转时间和数据使用协议的整体完成情况。我们正在朝着数据使用协议迈进,该协议将移至 PMP,可在线访问,并能够在收到后执行。所有更改均符合 1996 年健康保险流通和责任法案,并改进了周转时间和数据使用协议的整体完成情况。我们正在朝着数据使用协议迈进,该协议将移至 PMP,可在线访问,并能够在收到后执行。所有更改均符合 1996 年健康保险流通和责任法案,并改进了周转时间和数据使用协议的整体完成情况。我们正在朝着数据使用协议迈进,该协议将移至 PMP,可在线访问,并能够在收到后执行。


数字。 在美国心脏协会 (AHA) 精准医学平台 (PMP) 上设计开放数据计划。研究人员提交了手稿提案,由研究和出版委员会审查。一旦获得批准,研究人员在 PMP 的安全工作区中分析数据并发表手稿。COVID-19 表示 2019 年冠状病毒病;GWTG,遵循指南;和 NDA,保密协议。


使用了几种方法来支持 PMP 的新研究人员。COVID-19 研究和出版委员会的一名成员被指定为每个项目的联络人,以帮助进行分析规划和数据问题。每周的办公时间以虚拟方式举行,以听取研究人员的需求和建议。这些方法共同改善了沟通,解决了许多问题,并显着加快了进程。截至 2021 年 4 月 20 日,已接受 40 项以研究者为主导的分析的提案,已起草 15 项分析提交,并已发表 9 篇手稿。1,3,6,8-13


我们致力于 FAIR(可查找、可访问、可互操作和可重用)指导原则,其中包括统一定义、数据字典和数据文档。我们 COVID-19 GWTG 数据的交互式数据文档 (https://precision.heart.org/documentation/AHA-COVID19-CVD-GWTG/index.html) 提高了对数据定义和派生变量的理解,从而减少了跨手稿和处理开放数据的挫折。特别是,手稿中交互式数据文档的探索和发现部分说明了研究人员如何能够访问所有数据文档文件、数据缺失、数据分布等。数据文档不是多个平面 PDF 文件,需要用户在不清楚变量含义的情况下在文件之间切换。我们与由临床、统计和流行病学专家组成的 AHA COVID-19 指导委员会合作,制定了基于先前 GWTG 数据标准以及欧洲注册数据的数据标准。为了进一步应对这一挑战,我们试用了由我们的 AHA 数据科学团队以多种语言编写的使用示例和教程,允许用户在其工作区中重复使用共享代码,从而生成最终产品,例如数据的人口统计资料。因此,所有研究人员在他们的工作空间中使用相同的数据集以及经过验证和批准的代码,最终会得到相同的手稿人口统计资料。这提高了手稿之间的一致性。最后,COVID 研究和出版委员会的成员和 AHA 领导层在提交前审查了手稿,以提供质量监督和与原始提案的一致性。


PMP 上的数据治理和数据文档解决方案也经过了其他非营利组织的测试,包括美国临床肿瘤学会,该协会授权 PMP 的基础技术将 CancerLinQ Registry 数据提供给学术用户进行分析。AHA 还与重症监护医学协会合作,在我们的登记册之间映射变量,最终目标是增加我们对 COVID-19 及其对患者生活影响的理解。这两个机会都为最终用户提供了有关数据使用协议和数据文档的额外知识和解决方案。


为了克服在扩展资源以满足性能需求方面的挑战,我们与研究人员密切合作,为医学图像分割和大规模模拟训练神经网络。在心血管领域,可以使用算法在血管造影照片中分割冠状血管来改进诊断决策。PMP 的可扩展性、更大的内存和计算能力将研究人员与 4 个 NVIDIA Tesla K80 图形处理单元配对,以训练自定义管道 AngioNet——一种用于冠状动脉分割的神经网络。4通过这样做,研究团队能够增加用于训练网络每次迭代的图像数量,与在单个图形处理单元上进行训练相比,提高了准确性和通用性。另一个部署到 PMP 的计算密集型应用程序是 CRIMSON,2是一种开源血流动力学建模软件,已用于从心血管疾病研究到手术计划的广泛应用。2该软件基于有限元的flowsolver已在Docker容器中的PMP上编译,允许研究人员使用PMP的高性能计算资源对特定于患者的解剖模型进行大规模血流动力学模拟。


与跨电子健康记录数据进行无监督机器学习的研究人员合作还提供了创新解决方案,以提高最终用户工作空间的敏捷性。7 Zhao 等人7使用约束非负张量分解来提取研究队列中跨时间尺度的表型主题,该队列源自 PMP 上范德比尔特大学医学中心患者的电子健康记录的未识别副本。该研究确定了先前与心血管疾病相关的危险因素,以及新的潜在因素,包括维生素 D 缺乏症和抑郁症,以及泌尿系统感染。7


AHA 的 PMP 通过灵活的工作空间实现了数据的安全交付,这些工作空间可根据研究人员的高性能计算需求进行扩展,并允许随时可用的分析工具具有灵活性。通过倾听并与最终用户合作,我们克服了当今研究人员面临的许多障碍,包括过时的数据治理政策、数据文档不足以及基于云的环境无法扩展资源以提高性能并允许研究人员根据自己的需求个性化他们的工作空间。自己的工具和管道。


我们要感谢美国心脏协会 (AHA) COVID-19 指导委员会的所有成员,感谢他们自愿为该计划投入时间和专业知识。Get With The Guidelines 计划由 AHA 提供。精准医学平台由 AHA 建立,由 Amazon Web Services 提供支持,并由 Hitachi Vantara 提供支持。


该项目得到了美国心脏协会 (AHA) 的支持。AHA 的 COVID-19 CVD 登记处部分得到了戈登和贝蒂摩尔基金会的慷慨资助。


披露LM Stevens、Alger 博士、Hall 博士和 C. Rutan 是美国心脏协会的雇员。霍尔博士是明尼苏达大学的兼职教授。de Lemos 博士和 Elkind 博士是美国心脏协会的无薪官员。de Lemos 博士透露,他获得了雅培诊断和罗氏诊断的赠款支持,以及雅培诊断、Ortho Clinical Diagnostics、Quidel Cardiovascular、Amgen、Regeneron、Eli Lilly 和 Novo Nordisk 以及 Janssen 的咨询收入。Elkind 博士披露从 BMS-Pfizer Alliance for Eliquis 获得研究药物实物,并从 Roche 获得辅助研究资金,用于 NIH 资助的中风预防试验;从 UpToDate 获得与中风相关章节的版税;并获得 NINDS、NHLBI 和 Leducq 基金会的资助。Figueroa 博士透露,他是 AngioInsight 的联合创始人,


本文中表达的观点不一定是编辑或美国心脏协会的观点。


这份手稿已发送给高级客座编辑 Dennis T. Ko 医学博士,以供专家评审、编辑决定和最终处置。


有关资金来源和披露信息,请参见第 943 页。


更新日期:2021-09-22
down
wechat
bug