当前位置: X-MOL 学术Database J. Biol. Databases Curation › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A negative storage model for precise but compact storage of genetic variation data.
Database: The Journal of Biological Databases and Curation ( IF 5.8 ) Pub Date : 2020-01-01 , DOI: 10.1093/database/baz158
Guillermo Gonzalez-Calderon 1 , Ruizheng Liu 1 , Rodrigo Carvajal 1 , Jamie K Teer 1, 2
Affiliation  

Falling sequencing costs and large initiatives are resulting in increasing amounts of data available for investigator use. However, there are informatics challenges in being able to access genomic data. Performance and storage are well-appreciated issues, but precision is critical for meaningful analysis and interpretation of genomic data. There is an inherent accuracy vs. performance trade-off with existing solutions. The most common approach (Variant-only Storage Model, VOSM) stores only variant data. Systems must therefore assume that everything not variant is reference, sacrificing precision and potentially accuracy. A more complete model (Full Storage Model, FSM) would store the state of every base (variant, reference and missing) in the genome thereby sacrificing performance. A compressed variation of the FSM can store the state of contiguous regions of the genome as blocks (Block Storage Model, BLSM), much like the file-based gVCF model. We propose a novel approach by which this state is encoded such that both performance and accuracy are maintained. The Negative Storage Model (NSM) can store and retrieve precise genomic state from different sequencing sources, including clinical and whole exome sequencing panels. Reduced storage requirements are achieved by storing only the variant and missing states and inferring the reference state. We evaluate the performance characteristics of FSM, BLSM and NSM and demonstrate dramatic improvements in storage and performance using the NSM approach.

中文翻译:

用于精确但紧凑地存储遗传变异数据的负存储模型。

降低的测序成本和庞大的计划正导致可用于研究人员的数据量增加。但是,在能够访问基因组数据方面存在信息学挑战。性能和存储是众所周知的问题,但是精度对于有意义的基因组数据分析和解释至关重要。与现有解决方案之间存在固有的精度与性能之间的权衡。最常见的方法(仅变体存储模型,VOSM)仅存储变体数据。因此,系统必须假设所有非变体都是参考,这会牺牲精度和潜在的准确性。一个更完整的模型(完整存储模型,FSM)将存储基因组中每个碱基的状态(变异,参考和缺失),从而牺牲性能。FSM的压缩变体可以将基因组连续区域的状态存储为块(块存储模型,BLSM),就像基于文件的gVCF模型一样。我们提出了一种新颖的方法,通过该方法对该状态进行编码,以保持性能和准确性。负存储模型(NSM)可以存储和检索来自不同测序来源(包括临床和整个外显子组测序面板)的精确基因组状态。通过仅存储变体和缺失状态并推断参考状态,可以减少存储需求。我们评估了FSM,BLSM和NSM的性能特征,并展示了使用NSM方法在存储和性能方面的显着改善。我们提出了一种新颖的方法,通过该方法对该状态进行编码,以保持性能和准确性。负存储模型(NSM)可以存储和检索来自不同测序来源(包括临床和整个外显子组测序面板)的精确基因组状态。通过仅存储变体和缺失状态并推断参考状态,可以减少存储需求。我们评估了FSM,BLSM和NSM的性能特征,并展示了使用NSM方法在存储和性能方面的显着改善。我们提出了一种新颖的方法,通过该方法对该状态进行编码,以保持性能和准确性。负存储模型(NSM)可以存储和检索来自不同测序来源(包括临床和整个外显子组测序面板)的精确基因组状态。通过仅存储变体和缺失状态并推断参考状态,可以减少存储需求。我们评估了FSM,BLSM和NSM的性能特征,并展示了使用NSM方法在存储和性能方面的显着改善。通过仅存储变体和缺失状态并推断参考状态,可以减少存储需求。我们评估了FSM,BLSM和NSM的性能特征,并展示了使用NSM方法在存储和性能方面的显着改善。通过仅存储变体和缺失状态并推断参考状态,可以减少存储需求。我们评估了FSM,BLSM和NSM的性能特征,并展示了使用NSM方法在存储和性能方面的显着改善。
更新日期:2020-04-17
down
wechat
bug