Crediting pull requests to open source research software as an academic contribution☆
Section snippets
Motivation
Academic research in general – and the field developing algorithms for high performance computing in particular – suffers under the publish or perish paradigm. One consequence is the exponentially-increasing number of journal publications, workshop contributions, and conference proceedings [1]. Even though the development of High Performance Computing (HPC) algorithms is a relatively small research field, it is virtually impossible to keep track of the entire work contributed by peers.
The
Existing efforts
There already exist several strong efforts to improve the academic peer reviewing system for scientific software. Among the most well-known examples are the Replicated Computational Results (RCR [15]) initiative of the ACM Transactions on Mathematical Software (ACM TOMS [16]), and the Journal of Open Source Software (JOSS [17]). The two have orthogonal intentions, putting their main focus either on the scientific or software contribution.
ACM TOMS is a journal in the traditional sense. The RCR
The collaborative development effort of open source community software
Community software packages are typically developed in the environment of a distributed versioning system such as Git [18] or Mercurial [19]. These versioning systems are not only able to take snapshots of source code that can be revisited or retrieved at a later point, but also provide the means to track changes and orchestrate modifications introduced by several developers, therewith enabling the efficient development of software in a collaborative effort. The underlying concept to
Software pull requests as a conference contribution
We propose to emphasize the significance of software contributions by making them a contribution concept for conferences on HPC algorithms. Obviously, a software contribution submitted as a conference contribution is required to satisfy not just technical but also scientific requirements, such as a detailed algorithm description and feature specification, but also functionality testing and efficiency analysis. The idea is that researchers directly submit a software pull request of a legitimate
Implementing a workflow for accepting pull requests as a conference publication
In Fig. 2 we outline the peer review workflow we envision for accepting pull requests to community software as a conference publication. To make the submission of a pull request as a conference contribution as convenient as possible, we propose to facilitate software contribution templates. These templates provide the creator of a software contribution with the information on what needs to be provided for a successful submission, and guidelines on how to best do so. Table 1 summarizes relevant
Example of a well-designed software contribution
We use an example to illustrate how to design a software pull request that qualifies as a conference contribution. Instead of discussing an artificial contribution, we recall an already existing pull request to the Ginkgo1 Open Source library publicly hosted on the GitHub repository hosting site. We emphasize that we do not select the pull request #1592 because of its technical and scientific content (qualifying
Scope and limitations
We recognize that the contribution format proposed in this work is not suitable for all types of conference contributions. One example would be a purely theoretical exposition of a new algorithm or method that does not yet have a high performance implementation, and whose practical implementation or performance is not part of the contributions. Another example are papers that do not aim at contributing an algorithm or software component, with this paper being such an example. Thus, we do not
Summary
Like in any other academic field, scientists in High Performance Computing suffer under the publish or perish paradigm. As a result, novel algorithm designs and high performance kernel implementations often reside as prototype implementation and are never adopted as production-ready community code. To counteract this inefficiency, we propose to establish a new form of conference contribution that is based on software pull requests to open source community software.
The idea is to complement a
Conflict of interest
The authors declare no conflict of interest.
Declaration of Competing Interest
The authors report no declarations of interest.
Acknowledgments
The authors want to express their appreciation for comments of the anonymous reviewers of the PDSEC’19 workshop. Acknowledging that this paper is provocative and we sure failed to consider all aspects of this controversial topic, we are highly thankful for the valuable feedback and input. We also thank Fabian Brunk for comments and discussion on an earlier version of the paper.
Hartwig Anzt is a Helmholtz-Young-Investigator Group leader at the Steinbuch Centre for Computing at the Karlsruhe Institute of Technology. He obtained his PhD in Mathematics at the Karlsruhe Institute of Technology, and afterwards joined Jack Dongarra's Innovative Computing Lab at the University of Tennessee in 2013. Since 2015 he also holds a Senior Research Scientist position at the University of Tennessee. Hartwig Anzt has a strong background in numerical mathematics, specializes in
References (23)
UNESCO Science Report: Towards 2030, UNESCO Reference Works Series
(2015)An index to quantify an individual's scientific research output
Proc. Natl. Acad. Sci. USA
(2005)- et al.
Altmetrics: A Manifesto
(2011) The TOMS Initiative and Policies for Replicated Computational Results (RCR)
(2017)Supercomputing Conference Reproducibility Initiative
(2018)- xSDK: Extreme-scale Scientific Software Development Kit https://xsdk.info/ (accessed in August...
- Better Scientific Software (BSSw) https://bssw.io/ (accessed in August...
- MFEM: Modular finite element methods library, mfem.org....
- et al.
deal.II – a general purpose object oriented finite element library
ACM Trans. Math. Softw.
(2007) - T. Trilinos Project Team, The Trilinos Project...
PETSc Web Page
Cited by (1)
Hartwig Anzt is a Helmholtz-Young-Investigator Group leader at the Steinbuch Centre for Computing at the Karlsruhe Institute of Technology. He obtained his PhD in Mathematics at the Karlsruhe Institute of Technology, and afterwards joined Jack Dongarra's Innovative Computing Lab at the University of Tennessee in 2013. Since 2015 he also holds a Senior Research Scientist position at the University of Tennessee. Hartwig Anzt has a strong background in numerical mathematics, specializes in iterative methods and preconditioning techniques for the next generation hardware architectures. Hartwig Anzt has a long track record of high-quality software development. He is author of the MAGMA-sparse open source software package managing lead and developer of the Ginkgo numerical linear algebra library.
Eileen Kuehn received her PhD in computer science in 2017. She currently works at the Karlsruhe Institute of Technology in the domain of quantum computing. Her career includes work as research associate and project coordinator in several EU projects. For many years already, she is working in close collaboration with diverse domains including High Performance Computing, High Energy Physics, Climatology, or Museology. Her research activities focus on scalable data analytics for highly parallel, distributed systems and sustainable software.
Goran Flegar received his PhD from the University of Jaume I with focus on High Performance Computing. His research interests include sparse linear algebra, accelerator computing and software design. He also holds a bachelor's degree in mathematics and a master's degree in computer science and mathematics from the University of Zagreb. He is one of the founders of the Ginkgo software package, a modern C++ library primarily focused on the iterative solution of sparse linear systems via preconditioned Krylov subspace methods on high performance GPU and multicore architectures.
- ☆
This work is an extension of the position paper “Are we Doing the Right Thing? – A Critical Analysis of the Academic HPC Community” presented in the context of the PDSEC workshop at IPDPS 2019. This work was supported by the “Impuls und Vernetzungsfond” of the Helmholtz Association under grant VH-NG-1241.