当前位置: X-MOL 学术Circ. Cardiovasc. Qual. Outcomes › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Of Papers, PDFs, and Platforms
Circulation: Cardiovascular Quality and Outcomes ( IF 6.2 ) Pub Date : 2021-09-14 , DOI: 10.1161/circoutcomes.121.008466
Brahmajee K Nallamothu 1
Affiliation  

Nullius in verba


—Motto of the Royal Society of London


In 1665, the first issue of the Philosophical Transactions was published by the Royal Society of London—a seminal moment in science.1 Up until that point, existing methods for sharing research were limited to books, personal letters, and in-person gatherings. A regularly published journal that was primarily focused on sharing scholarly advances through research articles would make scientific communication more rapid, durable, and expansive. The first articles in that inaugural issue of the Philosophical Transactions covered topics ranging from optic glasses to Jupiter’s Great Red Spot to a case report of a “very odd monstrous” calf.2–4


Much has changed in scientific communication over these past 350 years ago. The breadth and number of research articles published annually is now staggering—estimated at >2.5 million—and grows each year.5 Meanwhile, digitization and modern information technology tools have improved access to research articles and accelerated the pace at which articles can be shared across investigators. Yet despite such fundamental changes in how we create and circulate research, at a conceptual level, scientific communication has remained remarkably stable. Research articles continue to be largely static documents that describe a distinct set of experimental observations using a highly structured format. Journals then compile these articles for widespread distribution. Although webpages and PDFs have replaced paper, these core features have stayed unchanged for decades.


But this is about to change. In this issue of Circulation: Cardiovascular Quality & Outcomes (CQO), we publish a unique article by Jaeger et al6 that represents a small step toward what I believe is a different and exciting future.


On its surface, the Jaeger et al6 article is like other research we publish in CQO. It specifically focuses on how different strategies to impute missing data impact on risk prediction models in patients receiving mechanical circulatory support. This is an important research question to tackle, and the authors’ findings will be of general interest to many readers of CQO. Yet like much of contemporary science, the article’s results are not easily understood. Despite the authors' clear presentation of tables and figures, it can be difficult to communicate complex analyses within a static document. In fact, many articles we publish at CQO—particularly those in our Novel Statistical Methods series—would be easier to follow if readers could directly engage with the authors’ statistical code and data. To address this, the authors include a link to a GitHub repository as well as instructions on how to access their raw data through the Biologic Specimen and Data Repository Information Coordinating Center. We have endorsed statistical code and data sharing practices at CQO for years,7 and such approaches also are encouraged in the Transparency and Openness Promotion Guidelines8 followed by other American Heart Association journals.


But there is another, less conventional and yet very novel tool deployed in this article: a collaborative computational platform that includes self-contained data and statistical code that are readily accessible by readers in real-time. Working closely with the authors and the American Heart Association Precision Medicine Platform, we established an online cloud computing workspace9 that is an interactive environment where readers can directly and fully replicate analytical workflows. This environment also is linked to a unique digital object identifier that can be referenced separately from the article itself, and thus, goes a step beyond simply sharing links to online repositories of data and statistical code. Instructions for accessing it are available here: https://www.ahajournals.org/doi/suppl/10.1161/CIRCOUTCOMES.120.007071.


Building collaborative computational platforms around research articles like this has several benefits. Most critically, easy and immediate access to data and statistical code will allow readers to better grasp complex methods like those employed by Jaeger et al.6 We are certainly not the first to notice how this could change scientific communication. Bret Victor’s redesign of the classic Duncan Watts and Steven Strogatz article on small-world networks is a wonderful example of how active engagement with data can simplify even the most advanced concepts.10 In this manner, research articles transform from static documents into dynamic tools (or as Victor beautifully calls it, “Sequential Art”). I am unaware of others applying this approach in clinical research articles, but examples like this are spreading within other scientific fields. The writer and computer programmer James Somers has brilliantly discussed the extensive history and recent progress of these advances in an article I highly recommend.11


Other possible advantages also exist with linking research articles to such platforms. For instance, there will no longer be a need to write to authors to obtain statistical code or data—which is inefficient and too often ineffective.12 As a corollary, this also means that it will be unnecessary to download specific operating systems or software packages to repeat analyses given that such platforms can be self-contained environments.13 This would lead to the very real and practical benefit of ensuring computationally reproducible results and allow for timelier detection and correction of minor and major errors. If the use of collaborative computational platforms expands, they also could become a helpful deterrent for scientific fraud simply through their requirements for greater upfront transparency.


Will this system be perfect? No. And initially, it probably will not make sense for every research article. In setting up this system for the Jaeger et al6 article, we have already learned important lessons about the costs and barriers associated with operating such platforms. For example, readers will find that the default data provided in the American Heart Association Precision Medicine Platform for the Jaeger et al6 article is a synthetically created duplicate of the raw data. Although the raw data are publicly available, data use agreements with BioLINCC are still required before access is possible. Logistical barriers such as this will remain in the foreseeable future given extensive privacy and confidentiality concerns around protected health information. Major technical challenges will include how to ensure durability and maintenance of these platforms over time—especially through software upgrades that will inevitably occur over time. Virtualization solutions like Docker containers may be able to address these problems (eg, software dependencies) but will require additional investments of time and resources.13 Finally, this platform is a test environment, and as a consequence, we have created restrictions that still require registration and approval of users so that it is not overwhelmed. Admittedly, many of these barriers will make the experience clunky.


Challenges aside, one could soon imagine a future journal exclusively assembled as a collaborative computational platform. It would be entirely comprised of research articles where full access to data and statistical code is possible at the time of publication. This approach also would dramatically alter the life cycle of a research article, touching everything from prepublication peer review to editorial evaluations to post-publication commentary. Rather than representing the final product, papers and PDFs of these articles would serve as summaries and secondary artifacts of the platform. Yet reaching this vision will require more than overcoming technical and logistical challenges but addressing the daunting cultural barriers that still limit statistical code and data sharing practices in clinical research.14 Establishing a norm where such openness is expected rather than exceptional will not be easy. Large projects like the National Institutes of Health All of Us program that are building collaborative computational platforms to share data right from their initiation may help to introduce standards.15


The motto of the Royal Society is “nullius in verba”—translated loosely as “take no one’s word for it.” Publishing research articles in the Philosophical Transactions was a truly groundbreaking event in the history of science. It established the principle that no matter how authoritative, opinion alone would no longer suffice for reporting a discovery. Written proof and facts in the public domain would be needed. Technology now allows us to push that transparency even further through computational platforms. Rather than taking the authors’ word for it, we are now edging even closer toward seeing for ourselves when it comes to data and analysis.


I would like to thank Drs Jim Burke, Mike Ho, and Armando Teixeira-Pinto for their wonderful comments on earlier versions of this article. I would like to thank Dr Byron Jaeger and his team as well as Dr Jen Hall and the American Heart Association (AHA) Precision Medicine Platform with creating and supporting a collaborative computational platform for this research article.


Disclosures Disclosures provided by Dr Nallamothu in compliance with American Heart Association’s annual Journal Editor Disclosure Questionnaire are available at https://www.ahajournals.org/pb-assets/policies/COI_09_2020-1600719273583.pdf.


The opinions expressed in this article are not necessarily those of the American Heart Association.


For Disclosures, see page 920.




中文翻译:

论文、PDF 和平台

Nullius 动词原形


——伦敦皇家学会的座右铭


1665 年,伦敦皇家学会出版了第一期《哲学汇刊》——这是科学界的开创性时刻。1在那之前,共享研究的现有方法仅限于书籍、私人信件和面对面的聚会。定期出版的期刊主要侧重于通过研究文章分享学术进展,将使科学传播更加快速、持久和广泛。《哲学汇刊》创刊号中的第一篇文章涵盖的主题从光学眼镜到木星的大红斑,再到“非常奇怪的怪物”小牛的案例报告。2–4


在过去的 350 年前,科学传播发生了很大变化。每年发表的研究文章的广度和数量现在令人震惊——估计超过 250 万篇——并且每年都在增长。5同时,数字化和现代信息技术工具改善了对研究文章的访问,并加快了研究人员之间共享文章的速度。然而,尽管我们创造和传播研究的方式发生了如此根本的变化,但在概念层面上,科学传播仍然非常稳定。研究文章仍然主要是静态文件,使用高度结构化的格式描述一组不同的实验观察。然后,期刊编辑这些文章以进行广泛分发。尽管网页和 PDF 已经取代了纸张,但这些核心功能几十年来一直保持不变。


但这种情况即将改变。在本期循环:心血管质量和结果 (CQO ) 中,我们发表了 Jaeger 等人6 的一篇独特文章,它代表着朝着我认为的不同且令人兴奋的未来迈出的一小步。


从表面上看,Jaeger 等人6 的文章就像我们在CQO 上发表的其他研究一样。它特别关注在接受机械循环支持的患者中,估算缺失数据的不同策略如何影响风险预测模型。这是一个需要解决的重要研究问题,作者的发现将引起许多CQO读者的普遍兴趣。然而,像许多当代科学一样,这篇文章的结果并不容易理解。尽管作者清楚地展示了表格和图形,但在静态文档中很难传达复杂的分析。事实上,我们在CQO发表的很多文章如果读者可以直接接触作者的统计代码和数据,尤其是我们的新型统计方法系列中的那些,将会更容易理解。为了解决这个问题,作者提供了一个指向 GitHub 存储库的链接以及有关如何通过生物样本和数据存储库信息协调中心访问其原始数据的说明。多年来,我们一直支持CQO 的统计代码和数据共享实践,7并且在透明度和开放性促进指南8 中也鼓励此类方法,随后美国心脏协会的其他期刊也遵循该指南。


但是,本文还部署了另一种不太传统但非常新颖的工具:一个协作计算平台,其中包括读者可以轻松实时访问的自包含数据和统计代码。我们与作者和美国心脏协会精准医学平台密切合作,建立了一个在线云计算工作区9这是一个交互式环境,读者可以在其中直接和完全复制分析工作流程。该环境还链接到一个唯一的数字对象标识符,该标识符可以与文章本身分开引用,因此,它超越了简单地共享数据和统计代码在线存储库的链接。访问说明可在此处获得:https://www.ahajournals.org/doi/suppl/10.1161/CIRCOUTCOMES.120.007071。


围绕这样的研究文章构建协作计算平台有几个好处。最重要的是,对数据和统计代码的轻松和即时访问将使读者能够更好地掌握 Jaeger 等人使用的复杂方法。6我们当然不是第一个注意到这会如何改变科学传播的人。Bret Victor 重新设计了 Duncan Watts 和 Steven Strogatz 关于小世界网络的经典文章,这是一个很好的例子,说明积极参与数据可以如何简化最先进的概念。10通过这种方式,研究文章从静态文档转变为动态工具(或者 Victor 漂亮地称之为“顺序艺术”)。我不知道其他人在临床研究文章中应用这种方法,但这样的例子正在其他科学领域中传播。作家和计算机程序员 James Somers 在我强烈推荐的一篇文章中出色地讨论了这些进步的广泛历史和最新进展。11


将研究文章链接到此类平台也存在其他可能的优势。例如,不再需要写信给作者来获取统计代码或数据——这是低效的,而且往往是无效的。12作为推论,这也意味着无需下载特定的操作系统或软件包来重复分析,因为这些平台可以是独立的环境。13这将带来非常真实和实用的好处,即确保在计算上可重现的结果,并允许更及时地检测和纠正次要和主要错误。如果协作计算平台的使用范围扩大,它们也可以仅仅通过提高前期透明度的要求而成为对科学欺诈的有用威慑。


这个系统会完美吗?不。最初,它可能对每篇研究文章都没有意义。在为 Jaeger 等人6 的文章设置该系统时,我们已经吸取了与运营此类平台相关的成本和障碍的重要经验教训。例如,读者会发现美国心脏协会精准医学平台为 Jaeger 等人提供的默认数据6文章是原始数据的合成副本。尽管原始数据是公开可用的,但在访问之前仍需要与 BioLINCC 达成数据使用协议。鉴于对受保护健康信息存在广泛的隐私和保密问题,在可预见的未来,此类后勤障碍仍将存在。主要的技术挑战将包括如何确保这些平台的耐用性和长期维护——尤其是通过随着时间的推移不可避免地发生的软件升级。Docker 容器等虚拟化解决方案可能能够解决这些问题(例如,软件依赖性),但需要额外的时间和资源投资。13最后,这个平台是一个测试环境,因此,我们创建了仍然需要用户注册和批准的限制,以免它不堪重负。诚然,这些障碍中的许多都会使体验变得笨拙。


撇开挑战不谈,人们很快就可以想象未来的期刊专门组装为一个协作计算平台。它将完全由研究文章组成,在出版时可以完全访问数据和统计代码。这种方法还将极大地改变研究文章的生命周期,涉及从发表前同行评审到编辑评估再到发表后评论的方方面面。这些文章的论文和 PDF 将作为平台的摘要和次要工件,而不是代表最终产品。然而,实现这一愿景不仅需要克服技术和后勤方面的挑战,还需要解决仍然限制临床研究中统计代码和数据共享实践的艰巨文化障碍。14在预期这种开放而不是例外的情况下建立规范并非易事。像美国国立卫生研究院 All of Us 计划这样的大型项目正在构建协作计算平台以从一开始就共享数据,这可能有助于引入标准。15


皇家学会的座右铭是“nullius in verba”——粗略地翻译为“不要相信任何人的话”。在《哲学汇刊》上发表研究文章是科学史上真正的开创性事件。它确立了一个原则,即无论多么权威,仅凭意见已不足以报告发现。需要公共领域的书面证据和事实。现在,技术使我们能够通过计算平台进一步推动这种透明度。在谈到数据和分析时,我们现在不再相信作者的话,而是更接近于自己亲眼看到。


我要感谢 Jim Burke、Mike Ho 和 Armando Teixeira-Pinto 博士对本文早期版本的精彩评论。我要感谢 Byron Jaeger 博士和他的团队以及 Jen Hall 博士和美国心脏协会 (AHA) 精准医学平台为这篇研究文章创建和支持协作计算平台。


披露Nallamothu 博士根据美国心脏协会的年度期刊编辑披露问卷提供的披露可在 https://www.ahajournals.org/pb-assets/policies/COI_09_2020-1600719273583.pdf 获得。


本文中表达的观点不一定是美国心脏协会的观点。


有关披露,请参见第 920 页。


更新日期:2021-09-22
down
wechat
bug