当前位置: X-MOL 学术Scientometrics › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
How many preprints have actually been printed and why: a case study of computer science preprints on arXiv
Scientometrics ( IF 3.5 ) Pub Date : 2020-04-24 , DOI: 10.1007/s11192-020-03430-8
Jialiang Lin , Yao Yu , Yu Zhou , Zhiyang Zhou , Xiaodong Shi

Preprints play an increasingly critical role in academic communities. There are many reasons driving researchers to post their manuscripts to preprint servers before formal submission to journals or conferences, but the use of preprints has also sparked considerable controversy, especially surrounding the claim of priority. In this paper, a case study of computer science preprints submitted to arXiv from 2008 to 2017 is conducted to quantify how many preprints have eventually been printed in peer-reviewed venues. Among those published manuscripts, some are published under different titles and without an update to their preprints on arXiv. In the case of these manuscripts, the traditional fuzzy matching method is incapable of mapping the preprint to the final published version. In view of this issue, we introduce a semantics-based mapping method with the employment of Bidirectional Encoder Representations from Transformers (BERT). With this new mapping method and a plurality of data sources, we find that 66% of all sampled preprints are published under unchanged titles and 11% are published under different titles and with other modifications. A further analysis was then performed to investigate why these preprints but not others were accepted for publication. Our comparison reveals that in the field of computer science, published preprints feature adequate revisions, multiple authorship, detailed abstract and introduction, extensive and authoritative references and available source code.

中文翻译:

实际印刷了多少预印本以及原因:arXiv 上计算机科学预印本的案例研究

预印本在学术界发挥着越来越重要的作用。促使研究人员在正式提交给期刊或会议之前将他们的手稿发布到预印服务器的原因有很多,但预印本的使用也引发了相当大的争议,尤其是围绕优先权的主张。在本文中,对 2008 年至 2017 年提交给 arXiv 的计算机科学预印本进行了案例研究,以量化最终在同行评审场所印刷了多少预印本。在这些已发表的手稿中,有些以不同的标题发表,并且没有更新它们在 arXiv 上的预印本。在这些手稿的情况下,传统的模糊匹配方法无法将预印本映射到最终出版的版本。针对这个问题,我们引入了一种基于语义的映射方法,使用来自 Transformers (BERT) 的双向编码器表示。使用这种新的映射方法和多个数据源,我们发现所有抽样预印本中有 66% 以未更改的标题发表,11% 以不同的标题发表并进行了其他修改。然后进行了进一步的分析以调查为什么这些预印本而不是其他预印本被接受发表。我们的比较表明,在计算机科学领域,已发表的预印本具有充分的修订、多位作者、详细的摘要和介绍、广泛而权威的参考资料和可用的源代码。我们发现所有抽样预印本中有 66% 以未更改的标题发表,11% 以不同的标题发表并进行了其他修改。然后进行了进一步的分析以调查为什么这些预印本而不是其他预印本被接受发表。我们的比较表明,在计算机科学领域,已发表的预印本具有充分的修订、多位作者、详细的摘要和介绍、广泛而权威的参考资料和可用的源代码。我们发现所有抽样预印本中有 66% 以未更改的标题发表,11% 以不同的标题发表并进行了其他修改。然后进行了进一步的分析以调查为什么这些预印本而不是其他预印本被接受发表。我们的比较表明,在计算机科学领域,已发表的预印本具有充分的修订、多位作者、详细的摘要和介绍、广泛而权威的参考资料和可用的源代码。
更新日期:2020-04-24
down
wechat
bug