当前位置: X-MOL 学术J. Proteome Res. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
How Much Storage Precision Can Be Lost: Guidance for Near-Lossless Compression of Untargeted Metabolomics Mass Spectrometry Data
Journal of Proteome Research ( IF 4.4 ) Pub Date : 2024-04-19 , DOI: 10.1021/acs.jproteome.3c00851
Junjie Tong 1, 2 , Miaoshan Lu 1 , Ruimin Wang 1, 3, 4 , Shaowei An 3, 4 , Jinyin Wang 4, 5 , Tong Wang 1 , Cong Xie 1, 2 , Changbin Yu 1
Affiliation  

Several lossy compressors have achieved superior compression rates for mass spectrometry (MS) data at the cost of storage precision. Currently, the impacts of precision losses on MS data processing have not been thoroughly evaluated, which is critical for the future development of lossy compressors. We first evaluated different storage precision (32 bit and 64 bit) in lossless mzML files. We then applied 10 truncation transformations to generate precision-lossy files: five relative errors for intensities and five absolute errors for m/z values. MZmine3 and XCMS were used for feature detection and GNPS for compound annotation. Lastly, we compared Precision, Recall, F1 – score, and file sizes between lossy files and lossless files under different conditions. Overall, we revealed that the discrepancy between 32 and 64 bit precision was under 1%. We proposed an absolute m/z error of 10–4 and a relative intensity error of 2 × 10–2, adhering to a 5% error threshold (F1 – scores above 95%). For a stricter 1% error threshold (F1 – scores above 99%), an absolute m/z error of 2 × 10–5 and a relative intensity error of 2 × 10–3 were advised. This guidance aims to help researchers improve lossy compression algorithms and minimize the negative effects of precision losses on downstream data processing.

中文翻译:

可能会损失多少存储精度:非目标代谢组学质谱数据近乎无损压缩指南

一些有损压缩器以存储精度为代价实现了质谱 (MS) 数据的卓越压缩率。目前,精度损失对 MS 数据处理的影响尚未得到彻底评估,这对于有损压缩器的未来发展至关重要。我们首先评估了无损 mzML 文件中的不同存储精度(32 位和 64 位)。然后,我们应用 10 次截断变换来生成精度有损的文件:强度的 5 个相对误差和m / z值的 5 个绝对误差。 MZmine3 和 XCMS 用于特征检测,GNPS 用于化合物注释。最后,我们比较了不同条件下有损文件和无损文件的精度召回率F 1 –分数和文件大小。总体而言,我们发现 32 位和 64 位精度之间的差异低于 1%。我们建议绝对m / z误差为 10 –4,相对强度误差为 2 × 10 –2,遵循 5% 的误差阈值(F 1 –分数高于 95%)。对于更严格的 1% 误差阈值(F 1 –分数高于 99%),建议绝对m / z误差为 2 × 10 –5,相对强度误差为 2 × 10 –3 。本指南旨在帮助研究人员改进有损压缩算法,并最大限度地减少精度损失对下游数据处理的负面影响。
更新日期:2024-04-19
down
wechat
bug