当前位置: X-MOL 学术Org. Divers. Evol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Data storage and data re-use in taxonomy—the need for improved storage and accessibility of heterogeneous data
Organisms Diversity & Evolution ( IF 1.9 ) Pub Date : 2020-01-20 , DOI: 10.1007/s13127-019-00428-w
Birgit Gemeinholzer , Miguel Vences , Bank Beszteri , Teddy Bruy , Janine Felden , Ivaylo Kostadinov , Aurélien Miralles , Tim W. Nattkemper , Christian Printzen , Jasmin Renz , Nataliya Rybalka , Tanja Schuster , Tanja Weibulat , Thomas Wilke , Susanne S. Renner

The ability to rapidly generate and share molecular, visual, and acoustic data, and to compare them with existing information, and thereby to detect and name biological entities is fundamentally changing our understanding of evolutionary relationships among organisms and is also impacting taxonomy. Harnessing taxonomic data for rapid, automated species identification by machine learning tools or DNA metabarcoding techniques has great potential but will require their review, accessible storage, comprehensive comparison, and integration with prior knowledge and information. Currently, data production, management, and sharing in taxonomic studies are not keeping pace with these needs. Indeed, a survey of recent taxonomic publications provides evidence that few species descriptions in zoology and botany incorporate DNA sequence data. The use of modern high-throughput (-omics) data is so far the exception in alpha-taxonomy, although they are easily stored in GenBank and similar databases. By contrast, for the more routinely used image data, the problem is that they are rarely made available in openly accessible repositories. Improved sharing and re-using of both types of data requires institutions that maintain long-term data storage and capacity with workable, user-friendly but highly automated pipelines. Top priority should be given to standardization and pipeline development for the easy submission and storage of machine-readable data (e.g., images, audio files, videos, tables of measurements). The taxonomic community in Germany and the German Federation for Biological Data are researching options for a higher level of automation, improved linking among data submission and storage platforms, and for making existing taxonomic information more readily accessible.

中文翻译:

分类中的数据存储和数据重用——需要改进异构数据的存储和可访问性

快速生成和共享分子、视觉和声学数据,并将它们与现有信息进行比较,从而检测和命名生物实体的能力正在从根本上改变我们对生物体之间进化关系的理解,也正在影响分类学。通过机器学习工具或 DNA 元条形码技术利用分类学数据进行快速、自动化的物种识别具有巨大潜力,但需要对其进行审查、可访问存储、全面比较以及与先验知识和信息的集成。目前,分类学研究中的数据生产、管理和共享跟不上这些需求。事实上,最近对分类学出版物的调查提供的证据表明,动物学和植物学中很少有物种描述包含 DNA 序列数据。到目前为止,现代高通量 (-omics) 数据的使用是 alpha-taxonomy 中的一个例外,尽管它们很容易存储在 GenBank 和类似的数据库中。相比之下,对于更经常使用的图像数据,问题在于它们很少在可公开访问的存储库中提供。改进两种类型数据的共享和重用需要机构通过可行、用户友好但高度自动化的管道来维护长期数据存储和容量。应优先考虑标准化和管道开发,以便轻松提交和存储机器可读数据(例如,图像、音频文件、视频、测量表)。德国的分类学界和德国生物数据联合会正在研究更高水平自动化的选项,
更新日期:2020-01-20
down
wechat
bug