当前位置: X-MOL 学术Scientometrics › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Flagging incorrect nucleotide sequence reagents in biomedical papers: To what extent does the leading publication format impede automatic error detection?
Scientometrics ( IF 3.5 ) Pub Date : 2020-05-22 , DOI: 10.1007/s11192-020-03463-z
Cyril Labbé , Guillaume Cabanac , Rachael A. West , Thierry Gautier , Bertrand Favier , Jennifer A. Byrne

In an idealised vision of science the scientific literature is error-free. Errors reported during peer review are supposed to be corrected prior to publication, as further research establishes new knowledge based on the body of literature. It happens, however, that errors pass through peer review, and a minority of cases errata and retractions follow. Automated screening software can be applied to detect errors in manuscripts and publications. The contribution of this paper is twofold. First, we designed the erroneous reagent checking ( ERC ) benchmark to assess the accuracy of fact-checkers screening biomedical publications for dubious mentions of nucleotide sequence reagents. It comes with a test collection comprised of 1679 nucleotide sequence reagents that were curated by biomedical experts. Second, we benchmarked our own screening software called Seek&Blastn with three input formats to assess the extent of performance loss when operating on various publication formats. Our findings stress the superiority of markup formats (a 79% detection rate on XML and HTML) over the prominent PDF format (a 69% detection rate at most) regarding an error flagging task. This is the first published baseline on error detection involving reagents reported in biomedical scientific publications. The ERC benchmark is designed to facilitate the development and validation of software bricks to enhance the reliability of the peer review process.

中文翻译:

在生物医学论文中标记不正确的核苷酸序列试剂:领先的出版格式在多大程度上阻碍了自动错误检测?

在理想化的科学视野中,科学文献是没有错误的。同行评审期间报告的错误应该在发表之前得到纠正,因为进一步的研究会根据文献建立新的知识。然而,错误会通过同行评审,少数情况会出现勘误和撤稿。自动筛选软件可用于检测手稿和出版物中的错误。本文的贡献是双重的。首先,我们设计了错误试剂检查 (ERC) 基准来评估事实检查员筛选生物医学出版物中可疑提及核苷酸序列试剂的准确性。它配备了由生物医学专家策划的 1679 个核苷酸序列试剂组成的测试集。第二,我们使用三种输入格式对我们自己的名为 Seek&Blastn 的筛选软件进行了基准测试,以评估在各种出版物格式上运行时的性能损失程度。我们的研究结果强调了标记格式(对 XML 和 HTML 的检测率为 79%)在错误标记任务方面优于突出的 PDF 格式(最多为 69% 的检测率)。这是关于生物医学科学出版物中报告的试剂的错误检测的第一个发布基线。ERC 基准测试旨在促进软件块的开发和验证,以提高同行评审过程的可靠性。我们的研究结果强调了标记格式(对 XML 和 HTML 的检测率为 79%)在错误标记任务方面优于突出的 PDF 格式(最多为 69% 的检测率)。这是关于生物医学科学出版物中报告的试剂的错误检测的第一个发布基线。ERC 基准测试旨在促进软件块的开发和验证,以提高同行评审过程的可靠性。我们的研究结果强调了标记格式(对 XML 和 HTML 的检测率为 79%)在错误标记任务方面优于突出的 PDF 格式(最多为 69% 的检测率)。这是关于生物医学科学出版物中报告的试剂的错误检测的第一个发布基线。ERC 基准测试旨在促进软件块的开发和验证,以提高同行评审过程的可靠性。
更新日期:2020-05-22
down
wechat
bug