Solving the cold-start problem in scientific credit allocation
Introduction
The increasing ascendancy of collaboration is one of the most common trends observed in all domains of modern science and technology along with the disappearance of solo scientific discoveries (Guimerà, Uzzi, Spiro, & Lus, 2005; Wu, Wang, & Evans, 2019). Hence, the synergy of collaboration is an essential component in complex scientific projects that require multidisciplinary solutions (Falk-Krzesinski, 2011; Milojević, 2014). Collaboration allows for the integration of knowledge and the mandate of research, both of which require comprehensiveness and diversity. Solitary works generally lead to lower impact publications relative to collaborative science. Moreover, high-quality published papers are the products of science activity and bear crucial effects on scientists’ academic reputations and stances (Carpenter, Cone, & Sarli, 2014; Li, Fortunato, Yin, & Wang, 2020; Wuchty, Jones, & Uzzi, 2007). Given that many researchers independently developed the academic community in the old era, the current community continues to reward self-sufficient researchers based on individual scholastic achievements. In this sense, the sole author is credited with all of the contributions of papers only with a single author, which was the commonly accepted norm in science decades ago. However, as that rule fails for co-authored publications, it has created a situation that may become even worse when co-authors from varied domains implement different contribution assignment criteria in multidisciplinary projects (Lehmann, Jackson, & Lautrup, 2006). Meanwhile, switching more frequently between topics has recently become an increasing trend (Zeng et al., 2019) Nevertheless, it is expected that talented, ethical, well-prepared individuals be rewarded for their hard-earned accomplishments. This expectation is beneficial to the long-term development of the system of science (Pavlidis, Petersen, & Semendeferi, 2014). As such, identifying the relative credit of each collaborator to the co-authored domain-specific work is of much significance and is therefore fundamental to the academic appointment and promotion process of institutions (Juhász, Tóth, & Lengyel, 2020; Shen & Barabási, 2014).
Considerable research attention has been given to assigning credit fairly for multi-authored publications, and as a result, the scientific community has recently called for increasing concern regarding a subjective evaluation of the author's contribution combined with assessments of their co-authors’ contributions (Herz, Dan, Censor, & Bar-Haim, 2020). On the one hand, scientific journals have developed guidelines that recognize the contributions of each author to promote more reasonable credit allocation (Herz et al., 2020; Mohammad Tariqur Rahman, J.M.R.B., & A., N.H., 2017; Radicchi, Fortunato, Markines, & Vespignani, 2009). On the other hand, quantitative algorithms for discriminating scientific and intellectual contributions between individuals or scientific institutions were invented that ranged from the simple to the more elaborate. The simplest algorithm involves assigning each author equivalent contribution recognition, such as either full counting or fractional counting (Zeng et al., 2017). The full counting algorithm regards every author as a single author and thus, every author is awarded full credit, whereas the fractional counting algorithm calculates every author's credit as reciprocal to the total number of authors. However, since authors’ contributions to papers differ, the full counting algorithm inflates some authors’ contributions, while the fractional counting algorithm dilutes the principal contributors’ involvement in the papers (Waltman & van Eck, 2015). Thus, methods based primarily on the authorship are proposed, such as the geometric method (Egghe, Rousseau, & Van Hooydonk, 2000), the arithmetic method (Trueba & Guerrero, 2004), the harmonic method (Hagen, 2008) as well as the method based on network(Kim & Diesner, 2014). However, these types of algorithms cannot be used in all research fields as the rules of authorship bylines vary substantially. For example, in mathematics, the authorship is alphabetic; whereas in biology, the first and the last authors contribute the most to the article. Another way to allocate author credit is by declaring the contributions of each author in the article, thereby clarifying all authors’ roles in the research (Foulkes & Neylon, 1996; Mohammad Tariqur Rahman et al., 2017). Currently, the collective process perspective method to allocate author credit has become popular ([Bao and Wang, 2020], [Radicchi et al., 2009], [Shen and Barabási, 2014]). The main hypothesis of this method is that the citing process of the paper and other papers written by the same authors regarding the same research topic encodes the informal credit allocation, indicating that the main contributors to the paper are experienced in the research topic. The improved algorithms further consider the aging effect and the importance of citing sources during the collective process (Bao & Zhai, 2017; Wang, Fan, Zeng, & Di, 2019).
Typical state-of-the-art quantitative algorithms for allocating shared credit to authors of a paper have been recently designed and are, in one form or another, ultimately built on the direct citations of the target papers. Nevertheless, these algorithms neglect that each paper accumulates an unequal number of citations and that a relatively high proportion of all papers has only a few citations, a factor that results in less effective identification due to the extremely sparsely populated co-cited networks. This problem is more prominent in newly published papers, as they have insufficient time to accumulate citations. Although many previous studies have focused on the contribution allocating issues of scientific community collective methods, the intellectual contribution allocation of individual authors of papers during the early period has not been emphasized or systematically studied in the literature, which means this is a typical cold-start problem. Hence, we consider it significant to develop a more comprehensive and universal algorithm that appropriately characterizes the scientific credit of each author of a co-authored paper, wherein the credit to authors of papers during their early careers, as well as that in their late careers, can be appropriately allocated using our algorithm.
This paper is organized into four distinct sections. The first section is the introduction. This section is followed by a brief description of the dataset used in the article and statistical analyses of the dataset to demonstrate the various limitations of the existing quantitative algorithms of credit allocation in the second section. Next, we propose a new method based on referenced studies. In the third section, we select papers by Nobel laureates to validate the proposed algorithm's effectiveness and then apply the algorithm to ordinary papers in the early period following their publication to test the robustness of the proposed algorithm. This analysis is followed by an illustration of the credit share evolution of co-authors and an exploration of the universal credit share evolution pattern of scientific elites. Finally, we discuss the relation between credit share and position in the authorship bylines in the field of physics. Section 4 presents a discussion of the results and outlines the paper's conclusions.
Section snippets
Data
The database used in this study is obtained from the American Physical Society (APS) journals for the period 1893 to 2009 and includes journals of the physical review series and the reviews of modern physics. To avoid the problem of author name ambiguity, we use the author name dataset obtained from Sinatra et al., which has been processed using a comprehensive disambiguation method in the APS dataset (Sinatra, Wang, Deville, Song, & Barabási, 2016). The dataset is comprised of 458,584 papers
Validation
To quantitatively validate the effectiveness of , first and foremost, we first test it by examining Nobel Prize-winning papers, where the Nobel committee has decided who the Nobel prize is awarded (Turki, Hadj Taieb, & Aouicha, 2020). A widely accepted consensus is that the Nobel winner is the author who contributes most to the Nobel Prize-winning paper. Hence, he/she should be allocated greater credit shares than other collaborators. As the Nobel committee decides to whom the Nobel Prize
Conclusions and discussion
In many research situations, such as the promoting and funding of research, researchers are usually evaluated based on their independent contributions to the academic community to which they belong. However, with today's rapid development of collaborative and multidisciplinary science today, how to allocate the relative credit share of researchers is an increasing and challenging problem, as scientific works tend to involve a remarkable collection of researchers from various groups of different
Authors’ contribution
Yanmeng Xing: Software, Validation, Writing - original draft, Writing - review & editing, Formal analysis.
Fenghua Wang: Software, Writing - original draft, Writing - review & editing.
An Zeng: Conceptualization, Methodology, Writing - review & editing, Formal analysis, Data curation.
Ying Fan: Conceptualization, Methodology, Supervision.
Acknowledgement
This work is supported by the National Natural Science Foundation of China (Nos. 71843005 and 71731002).
References (32)
- et al.
Dynamic credit allocation in scientific literature
Scientometrics
(2017) - et al.
Metapath-guided credit allocation for identifying representative works
International world wide web conference committee
(2020) - et al.
Using publication metrics to highlight academic productivity and research impact
Academic Emergency Medicine
(2014) - et al.
Methods for accrediting publications to authors or countries: Consequences for evaluation studies
Journal of the Association for Information Science and Technology
(2000) Mapping a research agenda for the science of team
Research Evaluation
(2011)- et al.
Redefning authorship. Relative contribution should be given after each author’s name
British Medical Journal
(1996) - et al.
Team assembly mechanisms determine collaboration network structure and team performance
United States: American Association for the Advancement of Science
(2005) Harmonic allocation of authorship credit: Source-level correction of bibliometric bias assures accurate publication and citation analysis
PLoS One
(2008)- et al.
Opinion: Authors overestimate their contribution to scientific work, demonstrating a strong bias
Proceedings of the National Academy of Sciences
(2020) - et al.
Together we stand
Nature Physics
(2014)
Brokering the core and the periphery: Creative success and collaboration networks in the film industry
PLoS One
Citation-based author contribution measure for byline-independency
A network-based approach to coauthorship credit allocation
Scientometrics
Distortive effects of initial-based name disambiguation on measurements of large-scale coauthorship networks
Journal of the Association for Information Science and Technology
Measures for measures
Nature
A dataset of publication records for Nobel laureates
Scientific Data
Cited by (3)
An author credit allocation method with improved distinguishability and robustness
2023, Journal of Data and Information ScienceCLARA: citation and similarity-based author ranking
2023, Scientometrics