当前位置: X-MOL 学术Mass Spectrom. Rev. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
The arc of Mass Spectrometry Exchange Formats is long, but it bends toward HDF5
Mass Spectrometry Reviews ( IF 6.9 ) Pub Date : 2016-10-14 , DOI: 10.1002/mas.21522
Manor Askenazi 1 , Hisham Ben Hamidane 2 , Johannes Graumann 2
Affiliation  

The evolution of data exchange in Mass Spectrometry spans decades and has ranged from human‐readable text files representing individual scans or collections thereof (McDonald et al., 2004) through the official standard XML‐based (Harold, Means, & Udemadu, 2005) data interchange standard (Deutsch, 2012), to increasingly compressed (Teleman et al., 2014) variants of this standard sometimes requiring purely binary adjunct files (Römpp et al., 2011). While the desire to maintain even partial human readability is understandable, the inherent mismatch between XML's textual and irregular format relative to the numeric and highly regular nature of actual spectral data, along with the explosive growth in dataset scales and the resulting need for efficient (binary and indexed) access has led to a phenomenon referred to as “technical drift” (Davis, 2013). While the drift is being continuously corrected using adjunct formats, compression schemes, and programs (Röst et al., 2015), we propose that the future of Mass Spectrometry Exchange Formats lies in the continued reliance and development of the PSI‐MS (Mayer et al., 2014) controlled vocabulary, along with an expedited shift to an alternative, thriving and well‐supported ecosystem for scientific data‐exchange, storage, and access in binary form, namely that of HDF5 (Koranne, 2011). Indeed, pioneering efforts to leverage this universal, binary, and hierarchical data‐format have already been published (Wilhelm et al., 2012; Rübel et al., 2013) though they have under‐utilized self‐description, a key property shared by HDF5 and XML. We demonstrate that a straightforward usage of plain (“vanilla”) HDF5 yields immediate returns including, but not limited to, highly efficient data access, platform independent data viewers, a variety of libraries (Collette, 2014) for data retrieval and manipulation in many programming languages and remote data access through comprehensive RESTful data‐servers. © 2016 The Authors. Mass Spectrometry Reviews published by Wiley Periodicals, Inc. Mass Spec Rev 36:668–673, 2017

中文翻译:

质谱交换格式的弧长,但向HDF5弯曲

质谱数据交换的发展跨越了几十年,范围从代表单个扫描或集合的人类可读文本文件(McDonald et al。,2004)到基于XML的官方标准(Harold,Means,&Udemadu,2005)。数据交换标准(Deutsch,2012)到压缩程度不断提高(Teleman等,2014)的变体,有时需要纯二进制附件文件(Römpp等,2011)。尽管维持人类部分可读性的愿望是可以理解的,但是XML的文本格式和不规则格式之间的固有不匹配(相对于实际光谱数据的数值和高度规则性质),以及数据集规模的爆炸性增长以及由此产生的对高效(二进制)的需求索引访问)导致了一种称为“技术漂移”的现象(戴维斯,2013)。虽然使用辅助格式,压缩方案和程序对漂移进行了连续校正(Röst等,2015),但我们认为质谱交换格式的未来在于PSI-MS的持续依赖和发展(Mayer等等人(2014年),控制词汇量,并迅速转向以二进制形式(即HDF5)进行科学数据交换,存储和访问的,蓬勃发展且得到良好支持的生态系统(Koranne,2011年)。确实,利用这种通用,二进制和分层数据格式的先驱性努力已经发表(Wilhelm等,2012;Rübel等,2013),尽管他们没有充分利用自我描述,这是HDF5和XML。我们证明,简单地使用普通(“香草”)HDF5会立即产生回报,其中包括:但不限于高效数据访问,独立于平台的数据查看器,用于以多种编程语言进行数据检索和操作的各种库(Collette,2014年)以及通过全面的RESTful数据服务器进行远程数据访问。©2016作者。质谱评论,由Wiley Periodicals,Inc.发布。Mass Rev 36:668–673,2017
更新日期:2016-10-14
down
wechat
bug