当前位置: X-MOL 学术Comput. Struct. Biotechnol. J. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Naught all zeros in sequence count data are the same
Computational and Structural Biotechnology Journal ( IF 6 ) Pub Date : 2020-09-28 , DOI: 10.1016/j.csbj.2020.09.014
Justin D Silverman 1, 2, 3 , Kimberly Roche 4 , Sayan Mukherjee 4, 5, 6 , Lawrence A David 4, 6, 7
Affiliation  

Genomic studies feature multivariate count data from high-throughput DNA sequencing experiments, which often contain many zero values. These zeros can cause artifacts for statistical analyses and multiple modeling approaches have been developed in response. Here, we apply different zero-handling models to gene-expression and microbiome datasets and show models can disagree substantially in terms of identifying the most differentially expressed sequences. Next, to rationally examine how different zero handling models behave, we developed a conceptual framework outlining four types of processes that may give rise to zero values in sequence count data. Last, we performed simulations to test how zero handling models behave in the presence of these different zero generating processes. Our simulations showed that simple count models are sufficient across multiple processes, even when the true underlying process is unknown. On the other hand, a common zero handling technique known as “zero-inflation” was only suitable under a zero generating process associated with an unlikely set of biological and experimental conditions. In concert, our work here suggests several specific guidelines for developing and choosing state-of-the-art models for analyzing sparse sequence count data.



中文翻译:

序列计数数据中的所有零都相同

基因组研究的特点是来自高通量 DNA 测序实验的多变量计数数据,这些数据通常包含许多零值。这些零可能会导致统计分析出现伪影,因此已经开发了多种建模方法。在这里,我们将不同的零处理模型应用于基因表达和微生物组数据集,并表明模型在识别差异最大的表达序列方面可能存在很大差异。接下来,为了合理地检查不同的零处理模型的行为方式,我们开发了一个概念框架,概述了可能在序列计数数据中产生零值的四种类型的过程。最后,我们进行了模拟来测试零处理模型在存在这些不同的零生成过程时的表现。我们的模拟表明,即使真正的底层流程未知,简单的计数模型也足以跨多个流程。另一方面,一种称为“零膨胀”的常见零处理技术仅适用于与一组不太可能的生物和实验条件相关的零生成过程。同时,我们在这里的工作提出了一些用于开发和选择用于分析稀疏序列计数数据的最先进模型的具体指南。

更新日期:2020-09-29
down
wechat
bug