Solving the cold-start problem in scientific credit allocation

https://doi.org/10.1016/j.joi.2021.101157Get rights and content

Highlights

  • We introduce an algorithm that can allocate credit to authors in newly published papers.

  • We validate the method by identifying the laureates of Nobel-winning papers.

  • Our method has a significantly higher accuracy and robustness than existing algorithms for papers with few citations.

  • We test the agreed-on rule on authorship and contribution and find distinguishable relation in past and recent publications in physics.

Abstract

A nearly universal trend in science today is the prominence of ever-increasing collaborative teams. Hence, identifying the relative credit due to each collaborator of published studies is of high significance. Although numerous methods have been employed to address this issue, allocating credit to all co-authors of new papers remains challenging. To address this cold-start issue, we introduce a credit allocation algorithm based on the co-citing network that captures the co-authors’ shared credit of a multi-authored publication. Using the American Physical Society publication data, we validate the method by examining papers by Nobel laureates. Accordingly, we perform many experiments to demonstrate that the proposed method can be implemented on academic papers in any period after publication with a significantly higher degree of accuracy and robustness than the existing algorithms applied to new papers. This method enables us to explore the universal credit evolution pattern of scientific elites. Importantly, by testing the relation between an author's credit and authorship byline, we observe that the first authors of papers are currently assigned less credit than in the early days with respect to physics. With collaboration and a large team set to dominate the agenda of the current science system, our study provides a more effective method for allocating early credit to co-authors of a paper, which may be beneficial to various academic activities, including faculty hiring, funding, and promotion decisions.

Introduction

The increasing ascendancy of collaboration is one of the most common trends observed in all domains of modern science and technology along with the disappearance of solo scientific discoveries (Guimerà, Uzzi, Spiro, & Lus, 2005; Wu, Wang, & Evans, 2019). Hence, the synergy of collaboration is an essential component in complex scientific projects that require multidisciplinary solutions (Falk-Krzesinski, 2011; Milojević, 2014). Collaboration allows for the integration of knowledge and the mandate of research, both of which require comprehensiveness and diversity. Solitary works generally lead to lower impact publications relative to collaborative science. Moreover, high-quality published papers are the products of science activity and bear crucial effects on scientists’ academic reputations and stances (Carpenter, Cone, & Sarli, 2014; Li, Fortunato, Yin, & Wang, 2020; Wuchty, Jones, & Uzzi, 2007). Given that many researchers independently developed the academic community in the old era, the current community continues to reward self-sufficient researchers based on individual scholastic achievements. In this sense, the sole author is credited with all of the contributions of papers only with a single author, which was the commonly accepted norm in science decades ago. However, as that rule fails for co-authored publications, it has created a situation that may become even worse when co-authors from varied domains implement different contribution assignment criteria in multidisciplinary projects (Lehmann, Jackson, & Lautrup, 2006). Meanwhile, switching more frequently between topics has recently become an increasing trend (Zeng et al., 2019) Nevertheless, it is expected that talented, ethical, well-prepared individuals be rewarded for their hard-earned accomplishments. This expectation is beneficial to the long-term development of the system of science (Pavlidis, Petersen, & Semendeferi, 2014). As such, identifying the relative credit of each collaborator to the co-authored domain-specific work is of much significance and is therefore fundamental to the academic appointment and promotion process of institutions (Juhász, Tóth, & Lengyel, 2020; Shen & Barabási, 2014).

Considerable research attention has been given to assigning credit fairly for multi-authored publications, and as a result, the scientific community has recently called for increasing concern regarding a subjective evaluation of the author's contribution combined with assessments of their co-authors’ contributions (Herz, Dan, Censor, & Bar-Haim, 2020). On the one hand, scientific journals have developed guidelines that recognize the contributions of each author to promote more reasonable credit allocation (Herz et al., 2020; Mohammad Tariqur Rahman, J.M.R.B., & A., N.H., 2017; Radicchi, Fortunato, Markines, & Vespignani, 2009). On the other hand, quantitative algorithms for discriminating scientific and intellectual contributions between individuals or scientific institutions were invented that ranged from the simple to the more elaborate. The simplest algorithm involves assigning each author equivalent contribution recognition, such as either full counting or fractional counting (Zeng et al., 2017). The full counting algorithm regards every author as a single author and thus, every author is awarded full credit, whereas the fractional counting algorithm calculates every author's credit as reciprocal to the total number of authors. However, since authors’ contributions to papers differ, the full counting algorithm inflates some authors’ contributions, while the fractional counting algorithm dilutes the principal contributors’ involvement in the papers (Waltman & van Eck, 2015). Thus, methods based primarily on the authorship are proposed, such as the geometric method (Egghe, Rousseau, & Van Hooydonk, 2000), the arithmetic method (Trueba & Guerrero, 2004), the harmonic method (Hagen, 2008) as well as the method based on network(Kim & Diesner, 2014). However, these types of algorithms cannot be used in all research fields as the rules of authorship bylines vary substantially. For example, in mathematics, the authorship is alphabetic; whereas in biology, the first and the last authors contribute the most to the article. Another way to allocate author credit is by declaring the contributions of each author in the article, thereby clarifying all authors’ roles in the research (Foulkes & Neylon, 1996; Mohammad Tariqur Rahman et al., 2017). Currently, the collective process perspective method to allocate author credit has become popular ([Bao and Wang, 2020], [Radicchi et al., 2009], [Shen and Barabási, 2014]). The main hypothesis of this method is that the citing process of the paper and other papers written by the same authors regarding the same research topic encodes the informal credit allocation, indicating that the main contributors to the paper are experienced in the research topic. The improved algorithms further consider the aging effect and the importance of citing sources during the collective process (Bao & Zhai, 2017; Wang, Fan, Zeng, & Di, 2019).

Typical state-of-the-art quantitative algorithms for allocating shared credit to authors of a paper have been recently designed and are, in one form or another, ultimately built on the direct citations of the target papers. Nevertheless, these algorithms neglect that each paper accumulates an unequal number of citations and that a relatively high proportion of all papers has only a few citations, a factor that results in less effective identification due to the extremely sparsely populated co-cited networks. This problem is more prominent in newly published papers, as they have insufficient time to accumulate citations. Although many previous studies have focused on the contribution allocating issues of scientific community collective methods, the intellectual contribution allocation of individual authors of papers during the early period has not been emphasized or systematically studied in the literature, which means this is a typical cold-start problem. Hence, we consider it significant to develop a more comprehensive and universal algorithm that appropriately characterizes the scientific credit of each author of a co-authored paper, wherein the credit to authors of papers during their early careers, as well as that in their late careers, can be appropriately allocated using our algorithm.

This paper is organized into four distinct sections. The first section is the introduction. This section is followed by a brief description of the dataset used in the article and statistical analyses of the dataset to demonstrate the various limitations of the existing quantitative algorithms of credit allocation in the second section. Next, we propose a new method based on referenced studies. In the third section, we select papers by Nobel laureates to validate the proposed algorithm's effectiveness and then apply the algorithm to ordinary papers in the early period following their publication to test the robustness of the proposed algorithm. This analysis is followed by an illustration of the credit share evolution of co-authors and an exploration of the universal credit share evolution pattern of scientific elites. Finally, we discuss the relation between credit share and position in the authorship bylines in the field of physics. Section 4 presents a discussion of the results and outlines the paper's conclusions.

Section snippets

Data

The database used in this study is obtained from the American Physical Society (APS) journals for the period 1893 to 2009 and includes journals of the physical review series and the reviews of modern physics. To avoid the problem of author name ambiguity, we use the author name dataset obtained from Sinatra et al., which has been processed using a comprehensive disambiguation method in the APS dataset (Sinatra, Wang, Deville, Song, & Barabási, 2016). The dataset is comprised of 458,584 papers

Validation

To quantitatively validate the effectiveness of COCD, first and foremost, we first test it by examining Nobel Prize-winning papers, where the Nobel committee has decided who the Nobel prize is awarded (Turki, Hadj Taieb, & Aouicha, 2020). A widely accepted consensus is that the Nobel winner is the author who contributes most to the Nobel Prize-winning paper. Hence, he/she should be allocated greater credit shares than other collaborators. As the Nobel committee decides to whom the Nobel Prize

Conclusions and discussion

In many research situations, such as the promoting and funding of research, researchers are usually evaluated based on their independent contributions to the academic community to which they belong. However, with today's rapid development of collaborative and multidisciplinary science today, how to allocate the relative credit share of researchers is an increasing and challenging problem, as scientific works tend to involve a remarkable collection of researchers from various groups of different

Authors’ contribution

Yanmeng Xing: Software, Validation, Writing - original draft, Writing - review & editing, Formal analysis.

Fenghua Wang: Software, Writing - original draft, Writing - review & editing.

An Zeng: Conceptualization, Methodology, Writing - review & editing, Formal analysis, Data curation.

Ying Fan: Conceptualization, Methodology, Supervision.

Acknowledgement

This work is supported by the National Natural Science Foundation of China (Nos. 71843005 and 71731002).

References (32)

  • P. Bao et al.

    Dynamic credit allocation in scientific literature

    Scientometrics

    (2017)
  • P. Bao et al.

    Metapath-guided credit allocation for identifying representative works

    International world wide web conference committee

    (2020)
  • C.R. Carpenter et al.

    Using publication metrics to highlight academic productivity and research impact

    Academic Emergency Medicine

    (2014)
  • L. Egghe et al.

    Methods for accrediting publications to authors or countries: Consequences for evaluation studies

    Journal of the Association for Information Science and Technology

    (2000)
  • H.J. Falk-Krzesinski

    Mapping a research agenda for the science of team

    Research Evaluation

    (2011)
  • W. Foulkes et al.

    Redefning authorship. Relative contribution should be given after each author’s name

    British Medical Journal

    (1996)
  • R. Guimerà et al.

    Team assembly mechanisms determine collaboration network structure and team performance

    United States: American Association for the Advancement of Science

    (2005)
  • N.T. Hagen

    Harmonic allocation of authorship credit: Source-level correction of bibliometric bias assures accurate publication and citation analysis

    PLoS One

    (2008)
  • N. Herz et al.

    Opinion: Authors overestimate their contribution to scientific work, demonstrating a strong bias

    Proceedings of the National Academy of Sciences

    (2020)
  • Ioannis Pavlidis et al.

    Together we stand

    Nature Physics

    (2014)
  • S. Juhász et al.

    Brokering the core and the periphery: Creative success and collaboration networks in the film industry

    PLoS One

    (2020)
  • S. Jung et al.

    Citation-based author contribution measure for byline-independency

    (2019)
  • J. Kim et al.

    A network-based approach to coauthorship credit allocation

    Scientometrics

    (2014)
  • J. Kim et al.

    Distortive effects of initial-based name disambiguation on measurements of large-scale coauthorship networks

    Journal of the Association for Information Science and Technology

    (2016)
  • S. Lehmann et al.

    Measures for measures

    Nature

    (2006)
  • J. Li et al.

    A dataset of publication records for Nobel laureates

    Scientific Data

    (2019)
  • View full text