Solving the cold-start problem in scientific credit allocation

doi:10.1016/j.joi.2021.101157

Journal of Informetrics

Volume 15, Issue 3, August 2021, 101157

https://doi.org/10.1016/j.joi.2021.101157 Get rights and content

Highlights

•
We introduce an algorithm that can allocate credit to authors in newly published papers.
•
We validate the method by identifying the laureates of Nobel-winning papers.
•
Our method has a significantly higher accuracy and robustness than existing algorithms for papers with few citations.
•
We test the agreed-on rule on authorship and contribution and find distinguishable relation in past and recent publications in physics.

Abstract

A nearly universal trend in science today is the prominence of ever-increasing collaborative teams. Hence, identifying the relative credit due to each collaborator of published studies is of high significance. Although numerous methods have been employed to address this issue, allocating credit to all co-authors of new papers remains challenging. To address this cold-start issue, we introduce a credit allocation algorithm based on the co-citing network that captures the co-authors’ shared credit of a multi-authored publication. Using the American Physical Society publication data, we validate the method by examining papers by Nobel laureates. Accordingly, we perform many experiments to demonstrate that the proposed method can be implemented on academic papers in any period after publication with a significantly higher degree of accuracy and robustness than the existing algorithms applied to new papers. This method enables us to explore the universal credit evolution pattern of scientific elites. Importantly, by testing the relation between an author's credit and authorship byline, we observe that the first authors of papers are currently assigned less credit than in the early days with respect to physics. With collaboration and a large team set to dominate the agenda of the current science system, our study provides a more effective method for allocating early credit to co-authors of a paper, which may be beneficial to various academic activities, including faculty hiring, funding, and promotion decisions.

Introduction

The increasing ascendancy of collaboration is one of the most common trends observed in all domains of modern science and technology along with the disappearance of solo scientific discoveries (Guimerà, Uzzi, Spiro, & Lus, 2005; Wu, Wang, & Evans, 2019). Hence, the synergy of collaboration is an essential component in complex scientific projects that require multidisciplinary solutions (Falk-Krzesinski, 2011; Milojević, 2014). Collaboration allows for the integration of knowledge and the mandate of research, both of which require comprehensiveness and diversity. Solitary works generally lead to lower impact publications relative to collaborative science. Moreover, high-quality published papers are the products of science activity and bear crucial effects on scientists’ academic reputations and stances (Carpenter, Cone, & Sarli, 2014; Li, Fortunato, Yin, & Wang, 2020; Wuchty, Jones, & Uzzi, 2007). Given that many researchers independently developed the academic community in the old era, the current community continues to reward self-sufficient researchers based on individual scholastic achievements. In this sense, the sole author is credited with all of the contributions of papers only with a single author, which was the commonly accepted norm in science decades ago. However, as that rule fails for co-authored publications, it has created a situation that may become even worse when co-authors from varied domains implement different contribution assignment criteria in multidisciplinary projects (Lehmann, Jackson, & Lautrup, 2006). Meanwhile, switching more frequently between topics has recently become an increasing trend (Zeng et al., 2019) Nevertheless, it is expected that talented, ethical, well-prepared individuals be rewarded for their hard-earned accomplishments. This expectation is beneficial to the long-term development of the system of science (Pavlidis, Petersen, & Semendeferi, 2014). As such, identifying the relative credit of each collaborator to the co-authored domain-specific work is of much significance and is therefore fundamental to the academic appointment and promotion process of institutions (Juhász, Tóth, & Lengyel, 2020; Shen & Barabási, 2014).

Considerable research attention has been given to assigning credit fairly for multi-authored publications, and as a result, the scientific community has recently called for increasing concern regarding a subjective evaluation of the author's contribution combined with assessments of their co-authors’ contributions (Herz, Dan, Censor, & Bar-Haim, 2020). On the one hand, scientific journals have developed guidelines that recognize the contributions of each author to promote more reasonable credit allocation (Herz et al., 2020; Mohammad Tariqur Rahman, J.M.R.B., & A., N.H., 2017; Radicchi, Fortunato, Markines, & Vespignani, 2009). On the other hand, quantitative algorithms for discriminating scientific and intellectual contributions between individuals or scientific institutions were invented that ranged from the simple to the more elaborate. The simplest algorithm involves assigning each author equivalent contribution recognition, such as either full counting or fractional counting (Zeng et al., 2017). The full counting algorithm regards every author as a single author and thus, every author is awarded full credit, whereas the fractional counting algorithm calculates every author's credit as reciprocal to the total number of authors. However, since authors’ contributions to papers differ, the full counting algorithm inflates some authors’ contributions, while the fractional counting algorithm dilutes the principal contributors’ involvement in the papers (Waltman & van Eck, 2015). Thus, methods based primarily on the authorship are proposed, such as the geometric method (Egghe, Rousseau, & Van Hooydonk, 2000), the arithmetic method (Trueba & Guerrero, 2004), the harmonic method (Hagen, 2008) as well as the method based on network(Kim & Diesner, 2014). However, these types of algorithms cannot be used in all research fields as the rules of authorship bylines vary substantially. For example, in mathematics, the authorship is alphabetic; whereas in biology, the first and the last authors contribute the most to the article. Another way to allocate author credit is by declaring the contributions of each author in the article, thereby clarifying all authors’ roles in the research (Foulkes & Neylon, 1996; Mohammad Tariqur Rahman et al., 2017). Currently, the collective process perspective method to allocate author credit has become popular ([Bao and Wang, 2020], [Radicchi et al., 2009], [Shen and Barabási, 2014]). The main hypothesis of this method is that the citing process of the paper and other papers written by the same authors regarding the same research topic encodes the informal credit allocation, indicating that the main contributors to the paper are experienced in the research topic. The improved algorithms further consider the aging effect and the importance of citing sources during the collective process (Bao & Zhai, 2017; Wang, Fan, Zeng, & Di, 2019).

Typical state-of-the-art quantitative algorithms for allocating shared credit to authors of a paper have been recently designed and are, in one form or another, ultimately built on the direct citations of the target papers. Nevertheless, these algorithms neglect that each paper accumulates an unequal number of citations and that a relatively high proportion of all papers has only a few citations, a factor that results in less effective identification due to the extremely sparsely populated co-cited networks. This problem is more prominent in newly published papers, as they have insufficient time to accumulate citations. Although many previous studies have focused on the contribution allocating issues of scientific community collective methods, the intellectual contribution allocation of individual authors of papers during the early period has not been emphasized or systematically studied in the literature, which means this is a typical cold-start problem. Hence, we consider it significant to develop a more comprehensive and universal algorithm that appropriately characterizes the scientific credit of each author of a co-authored paper, wherein the credit to authors of papers during their early careers, as well as that in their late careers, can be appropriately allocated using our algorithm.

This paper is organized into four distinct sections. The first section is the introduction. This section is followed by a brief description of the dataset used in the article and statistical analyses of the dataset to demonstrate the various limitations of the existing quantitative algorithms of credit allocation in the second section. Next, we propose a new method based on referenced studies. In the third section, we select papers by Nobel laureates to validate the proposed algorithm's effectiveness and then apply the algorithm to ordinary papers in the early period following their publication to test the robustness of the proposed algorithm. This analysis is followed by an illustration of the credit share evolution of co-authors and an exploration of the universal credit share evolution pattern of scientific elites. Finally, we discuss the relation between credit share and position in the authorship bylines in the field of physics. Section 4 presents a discussion of the results and outlines the paper's conclusions.

Section snippets

Data

The database used in this study is obtained from the American Physical Society (APS) journals for the period 1893 to 2009 and includes journals of the physical review series and the reviews of modern physics. To avoid the problem of author name ambiguity, we use the author name dataset obtained from Sinatra et al., which has been processed using a comprehensive disambiguation method in the APS dataset (Sinatra, Wang, Deville, Song, & Barabási, 2016). The dataset is comprised of 458,584 papers

Validation

To quantitatively validate the effectiveness of $COCD$ , first and foremost, we first test it by examining Nobel Prize-winning papers, where the Nobel committee has decided who the Nobel prize is awarded (Turki, Hadj Taieb, & Aouicha, 2020). A widely accepted consensus is that the Nobel winner is the author who contributes most to the Nobel Prize-winning paper. Hence, he/she should be allocated greater credit shares than other collaborators. As the Nobel committee decides to whom the Nobel Prize

Conclusions and discussion

In many research situations, such as the promoting and funding of research, researchers are usually evaluated based on their independent contributions to the academic community to which they belong. However, with today's rapid development of collaborative and multidisciplinary science today, how to allocate the relative credit share of researchers is an increasing and challenging problem, as scientific works tend to involve a remarkable collection of researchers from various groups of different

Authors’ contribution

Yanmeng Xing: Software, Validation, Writing - original draft, Writing - review & editing, Formal analysis.

Fenghua Wang: Software, Writing - original draft, Writing - review & editing.

An Zeng: Conceptualization, Methodology, Writing - review & editing, Formal analysis, Data curation.

Ying Fan: Conceptualization, Methodology, Supervision.

Acknowledgement

This work is supported by the National Natural Science Foundation of China (Nos. 71843005 and 71731002).

References (32)

P. Bao et al.
Dynamic credit allocation in scientific literature
Scientometrics
(2017)
P. Bao et al.
Metapath-guided credit allocation for identifying representative works
International world wide web conference committee
(2020)
C.R. Carpenter et al.
Using publication metrics to highlight academic productivity and research impact
Academic Emergency Medicine
(2014)
L. Egghe et al.
Methods for accrediting publications to authors or countries: Consequences for evaluation studies
Journal of the Association for Information Science and Technology
(2000)
H.J. Falk-Krzesinski
Mapping a research agenda for the science of team
Research Evaluation
(2011)
W. Foulkes et al.
Redefning authorship. Relative contribution should be given after each author’s name
British Medical Journal
(1996)
R. Guimerà et al.
Team assembly mechanisms determine collaboration network structure and team performance
United States: American Association for the Advancement of Science
(2005)
N.T. Hagen
Harmonic allocation of authorship credit: Source-level correction of bibliometric bias assures accurate publication and citation analysis
PLoS One
(2008)
N. Herz et al.
Opinion: Authors overestimate their contribution to scientific work, demonstrating a strong bias
Proceedings of the National Academy of Sciences
(2020)
Ioannis Pavlidis et al.
Together we stand
Nature Physics
(2014)

S. Juhász et al.

Brokering the core and the periphery: Creative success and collaboration networks in the film industry

PLoS One

(2020)

S. Jung et al.

Citation-based author contribution measure for byline-independency

(2019)

J. Kim et al.

A network-based approach to coauthorship credit allocation

Scientometrics

(2014)

J. Kim et al.

Distortive effects of initial-based name disambiguation on measurements of large-scale coauthorship networks

Journal of the Association for Information Science and Technology

(2016)

S. Lehmann et al.

Measures for measures

Nature

(2006)

J. Li et al.

A dataset of publication records for Nobel laureates

Scientific Data

(2019)

Cited by (3)

An author credit allocation method with improved distinguishability and robustness
2023, Journal of Data and Information Science
CLARA: citation and similarity-based author ranking
2023, Scientometrics
Predicting the impact and publication date of individual scientists’ future papers
2022, Scientometrics

View full text

Solving the cold-start problem in scientific credit allocation

Highlights

Abstract

Introduction

Section snippets

Data

Validation

Conclusions and discussion

Authors’ contribution

Acknowledgement

Dynamic credit allocation in scientific literature

Scientometrics

Metapath-guided credit allocation for identifying representative works

International world wide web conference committee

Using publication metrics to highlight academic productivity and research impact

Academic Emergency Medicine

Methods for accrediting publications to authors or countries: Consequences for evaluation studies

Journal of the Association for Information Science and Technology

Mapping a research agenda for the science of team

Research Evaluation

Redefning authorship. Relative contribution should be given after each author’s name

British Medical Journal

Team assembly mechanisms determine collaboration network structure and team performance

United States: American Association for the Advancement of Science

Harmonic allocation of authorship credit: Source-level correction of bibliometric bias assures accurate publication and citation analysis

PLoS One

Opinion: Authors overestimate their contribution to scientific work, demonstrating a strong bias

Proceedings of the National Academy of Sciences

Together we stand

Nature Physics

Brokering the core and the periphery: Creative success and collaboration networks in the film industry

PLoS One

Citation-based author contribution measure for byline-independency

A network-based approach to coauthorship credit allocation

Scientometrics

Distortive effects of initial-based name disambiguation on measurements of large-scale coauthorship networks

Journal of the Association for Information Science and Technology

Measures for measures

Nature

A dataset of publication records for Nobel laureates

Scientific Data