当前位置: X-MOL 学术Database J. Biol. Databases Curation › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Text mining meets community curation: a newly designed curation platform to improve author experience and participation at WormBase.
Database: The Journal of Biological Databases and Curation ( IF 3.4 ) Pub Date : 2020-01-01 , DOI: 10.1093/database/baaa006
Valerio Arnaboldi 1 , Daniela Raciti 1 , Kimberly Van Auken 1 , Juancarlos N Chan 1 , Hans-Michael Müller 1 , Paul W Sternberg 1
Affiliation  

Biological knowledgebases rely on expert biocuration of the research literature to maintain up-to-date collections of data organized in machine-readable form. To enter information into knowledgebases, curators need to follow three steps: (i) identify papers containing relevant data, a process called triaging; (ii) recognize named entities; and (iii) extract and curate data in accordance with the underlying data models. WormBase (WB), the authoritative repository for research data on Caenorhabditis elegans and other nematodes, uses text mining (TM) to semi-automate its curation pipeline. In addition, WB engages its community, via an Author First Pass (AFP) system, to help recognize entities and classify data types in their recently published papers. In this paper, we present a new WB AFP system that combines TM and AFP into a single application to enhance community curation. The system employs string-searching algorithms and statistical methods (e.g. support vector machines (SVMs)) to extract biological entities and classify data types, and it presents the results to authors in a web form where they validate the extracted information, rather than enter it de novo as the previous form required. With this new system, we lessen the burden for authors, while at the same time receive valuable feedback on the performance of our TM tools. The new user interface also links out to specific structured data submission forms, e.g. for phenotype or expression pattern data, giving the authors the opportunity to contribute a more detailed curation that can be incorporated into WB with minimal curator review. Our approach is generalizable and could be applied to additional knowledgebases that would like to engage their user community in assisting with the curation. In the five months succeeding the launch of the new system, the response rate has been comparable with that of the previous AFP version, but the quality and quantity of the data received has greatly improved.

中文翻译:


文本挖掘与社区策展的结合:一个新设计的策展平台,旨在改善 WormBase 的作者体验和参与度。



生物知识库依靠专家对研究文献的生物管理来维护以机器可读形式组织的最新数据集合。要将信息输入知识库,管理者需要遵循三个步骤:(i)识别包含相关数据的论文,这个过程称为分类; (ii) 承认指定实体; (iii) 根据底层数据模型提取和整理数据。 WormBase (WB) 是秀丽隐杆线虫和其他线虫研究数据的权威存储库,它使用文本挖掘 (TM) 来半自动化其管理流程。此外,WB 通过作者优先权 (AFP) 系统与其社区合作,帮助识别实体并对最近发表的论文中的数据类型进行分类。在本文中,我们提出了一种新的 WB AFP 系统,它将 TM 和 AFP 结合到一个应用程序中,以增强社区管理。该系统采用字符串搜索算法和统计方法(例如支持向量机(SVM))来提取生物实体并对数据类型进行分类,并以网络形式将结果呈现给作者,他们在其中验证提取的信息,而不是输入信息按照之前表格的要求从头开始。借助这个新系统,我们减轻了作者的负担,同时收到了有关我们 TM 工具性能的宝贵反馈。新的用户界面还链接到特定的结构化数据提交表格,例如表型或表达模式数据,使作者有机会贡献更详细的管理,可以通过最少的管理审查将其纳入WB。我们的方法是通用的,可以应用于希望让用户社区参与协助管理的其他知识库。 新系统推出后的五个月里,回复率已与之前法新社版本相当,但收到的数据质量和数量都有很大提高。
更新日期:2020-04-17
down
wechat
bug