当前位置: X-MOL 学术J. Assoc. Inf. Sci. Technol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Prevalence of nonsensical algorithmically generated papers in the scientific literature
Journal of the Association for Information Science and Technology ( IF 3.5 ) Pub Date : 2021-05-25 , DOI: 10.1002/asi.24495
Guillaume Cabanac 1 , Cyril Labbé 2
Affiliation  

In 2014 leading publishers withdrew more than 120 nonsensical publications automatically generated with the SCIgen program. Casual observations suggested that similar problematic papers are still published and sold, without follow-up retractions. No systematic screening has been performed and the prevalence of such nonsensical publications in the scientific literature is unknown. Our contribution is 2-fold. First, we designed a detector that combs the scientific literature for grammar-based computer-generated papers. Applied to SCIgen, it has a 83.6% precision. Second, we performed a scientometric study of the 243 detected SCIgen-papers from 19 publishers. We estimate the prevalence of SCIgen-papers to be 75 per million papers in Information and Computing Sciences. Only 19% of the 243 problematic papers were dealt with: formal retraction (12) or silent removal (34). Publishers still serve and sometimes sell the remaining 197 papers without any caveat. We found evidence of citation manipulation via edited SCIgen bibliographies. This work reveals metric gaming up to the point of absurdity: fraudsters publish nonsensical algorithmically generated papers featuring genuine references. It stresses the need to screen papers for nonsense before peer-review and chase citation manipulation in published papers. Overall, this is yet another illustration of the harmful effects of the pressure to publish or perish.

中文翻译:

科学文献中无意义的算法生成论文的盛行

2014 年,领先的出版商撤回了 120 多篇由 SCIgen 程序自动生成的无意义出版物。偶然的观察表明,类似的有问题的论文仍在发表和出售,没有后续撤稿。尚未进行系统筛查,科学文献中此类无意义出版物的流行情况尚不清楚。我们的贡献是两倍。首先,我们设计了一个检测器,用于梳理基于语法的计算机生成论文的科学文献。应用于 SCIgen,它具有 83.6% 的精度。其次,我们对来自 19 家出版商的 243 篇检测到的 SCIgen 论文进行了科学计量研究。我们估计 SCIgen 论文的流行率为每百万篇信息和计算科学论文中的 75 篇。243篇问题论文中只有19%得到处理:正式收回 (12) 或无声移除 (34)。出版商仍然会在没有任何警告的情况下提供剩余的 197 篇论文,有时也会出售。我们通过编辑的 SCIgen 参考书目发现了引用操纵的证据。这项工作揭示了度量博弈的荒谬之处:欺诈者发表了以真实参考为特征的荒谬算法生成的论文。它强调需要在同行评审之前筛选论文中的废话,并在已发表的论文中追踪引用操作。总的来说,这是出版或灭亡压力的有害影响的又一个例证。欺诈者发表无意义的算法生成的论文,其中包含真实的参考资料。它强调需要在同行评审之前筛选论文中的废话,并在已发表的论文中追踪引用操作。总的来说,这是出版或灭亡压力的有害影响的又一个例证。欺诈者发表无意义的算法生成的论文,其中包含真实的参考资料。它强调需要在同行评审之前筛选论文中的废话,并在已发表的论文中追踪引用操作。总的来说,这是出版或灭亡压力的有害影响的又一个例证。
更新日期:2021-05-25
down
wechat
bug