当前位置: X-MOL 学术Genome Biol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Personalized and graph genomes reveal missing signal in epigenomic data
Genome Biology ( IF 10.1 ) Pub Date : 2020-05-25 , DOI: 10.1186/s13059-020-02038-8
Cristian Groza 1 , Tony Kwan 1, 2 , Nicole Soranzo 3, 4, 5, 6 , Tomi Pastinen 1, 2, 7 , Guillaume Bourque 1, 2, 8, 9
Affiliation  

Background Epigenomic studies that use next generation sequencing experiments typically rely on the alignment of reads to a reference sequence. However, because of genetic diversity and the diploid nature of the human genome, we hypothesize that using a generic reference could lead to incorrectly mapped reads and bias downstream results. Results We show that accounting for genetic variation using a modified reference genome or a de novo assembled genome can alter histone H3K4me1 and H3K27ac ChIP-seq peak calls either by creating new personal peaks or by the loss of reference peaks. Using permissive cutoffs, modified reference genomes are found to alter approximately 1% of peak calls while de novo assembled genomes alter up to 5% of peaks. We also show statistically significant differences in the amount of reads observed in regions associated with the new, altered, and unchanged peaks. We report that short insertions and deletions (indels), followed by single nucleotide variants (SNVs), have the highest probability of modifying peak calls. We show that using a graph personalized genome represents a reasonable compromise between modified reference genomes and de novo assembled genomes. We demonstrate that altered peaks have a genomic distribution typical of other peaks. Conclusions Analyzing epigenomic datasets with personalized and graph genomes allows the recovery of new peaks enriched for indels and SNVs. These altered peaks are more likely to differ between individuals and, as such, could be relevant in the study of various human phenotypes.

中文翻译:

个性化和图形基因组揭示表观基因组数据中缺失的信号

背景 使用下一代测序实验的表观基因组研究通常依赖于读数与参考序列的比对。然而,由于遗传多样性和人类基因组的二倍体性质,我们假设使用通用参考可能会导致错误映射的读数和下游结果的偏差。结果我们表明,使用修改的参考基因组或从头组装的基因组来解释遗传变异可以通过创建新的个人峰或参考峰的丢失来改变组蛋白 H3K4me1 和 H3K27ac ChIP-seq 峰调用。使用允许的截止值,发现修改的参考基因组会改变大约 1% 的峰调用,而从头组装的基因组会改变多达 5% 的峰。我们还显示,在与新峰、改变峰和未变化峰相关的区域中观察到的读数量存在统计显着差异。我们报告说,短插入和缺失(indels),其次是单核苷酸变异(SNV),具有最高的修改峰调用的可能性。我们表明,使用图形个性化基因组代表了修改的参考基因组和从头组装的基因组之间的合理折衷。我们证明改变的峰具有其他峰典型的基因组分布。结论 使用个性化和图形基因组分析表观基因组数据集可以恢复富含插入缺失和 SNV 的新峰。这些改变的峰在个体之间更有可能存在差异,因此可能与各种人类表型的研究相关。
更新日期:2020-05-25
down
wechat
bug