当前位置: X-MOL 学术Comput. Sci. Eng. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Toward Long-Term and Archivable Reproducibility
Computing in Science & Engineering ( IF 1.8 ) Pub Date : 2021-04-13 , DOI: 10.1109/mcse.2021.3072860
Mohammad Akhlaghi 1 , Raul Infante-Sainz 2 , Boudewijn F. Roukema 3 , Mohammadreza Khellat 4 , David Valls-Gabaud 5 , Roberto Baena-Galle 6
Affiliation  

Analysis pipelines commonly use high-level technologies that are popular when created, but are unlikely to be readable, executable, or sustainable in the long term. A set of criteria is introduced to address this problem: completeness (no execution requirement beyond a minimal Unix-like operating system, no administrator privileges, no network connection, and storage primarily in plain text); modular design; minimal complexity; scalability; verifiable inputs and outputs; version control; linking analysis with narrative; and free and open-source software. As a proof of concept, we introduce “Maneage” (managing data lineage), enabling cheap archiving, provenance extraction, and peer verification that has been tested in several research publications. We show that longevity is a realistic requirement that does not sacrifice immediate or short-term reproducibility. The caveats (with proposed solutions) are then discussed and we conclude with the benefits for the various stakeholders. This article is itself a Maneage'd project (project commit 313db0b). Appendices—Two comprehensive appendices that review the longevity of existing solutions are available as supplementary “Web extras,” which are available in the IEEE Computer Society Digital Library at http://doi.ieeecomputersociety.org/10.1109/MCSE.2021.3072860. Reproducibility—All products available in zenodo.4913277, the Git history of this paper's source is at git.maneage.org/paper-concept.git, which is also archived in Software Heritage: swh:1:dir:33fea87068c1612daf011f161b97787b9a0df39fk. Clicking on the SWHIDs in the digital format will provide more “context” for same content.

中文翻译:


实现长期且可存档的再现性



分析管道通常使用高级技术,这些技术在创建时很流行,但从长远来看不太可能可读、可执行或可持续。引入了一组标准来解决这个问题:完整性(除了最小的类 Unix 操作系统之外没有执行要求,没有管理员权限,没有网络连接,并且主要以纯文本形式存储);模块化设计;最小的复杂性;可扩展性;可验证的输入和输出;版本控制;将分析与叙述联系起来;以及免费和开源软件。作为概念验证,我们引入了“Maneage”(管理数据沿袭),从而实现廉价的归档、来源提取和同行验证,这已经在多个研究出版物中进行了测试。我们证明,寿命是一个现实的要求,不会牺牲立即或短期的可重复性。然后讨论注意事项(以及建议的解决方案),最后我们总结出为各个利益相关者带来的好处。本文本身就是一个管理项目(项目提交 313db0b)。附录——两个全面的附录回顾了现有解决方案的寿命,作为补充“Web extras”提供,可在 IEEE 计算机协会数字图书馆 http://doi.ieeecomputersociety.org/10.1109/MCSE.2021.3072860 中找到。可重复性 - 所有产品均在 zenodo.4913277 中提供,本文来源的 Git 历史记录位于 git.maneage.org/paper-concept.git,该历史记录也存档在 Software Heritage 中:swh:1:dir:33fea87068c1612daf011f161b97787b9a0df39fk。单击数字格式的 SWHID 将为相同内容提供更多“上下文”。
更新日期:2021-04-13
down
wechat
bug