当前位置: X-MOL 学术Genom. Proteom. Bioinform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
The Biological Significance of Multi-copy Regions and Their Impact on Variant Discovery
Genomics, Proteomics & Bioinformatics ( IF 11.5 ) Pub Date : 2020-08-19 , DOI: 10.1016/j.gpb.2019.05.004
Jing Sun 1 , Yanfang Zhang 2 , Minhui Wang 3 , Qian Guan 3 , Xiujia Yang 4 , Jin Xia Ou 5 , Mingchen Yan 6 , Chengrui Wang 6 , Yan Zhang 6 , Zhi-Hao Li 7 , Chunhong Lan 1 , Chen Mao 7 , Hong-Wei Zhou 5 , Bingtao Hao 8 , Zhenhai Zhang 1
Affiliation  

Identification of genetic variants via high-throughput sequencing (HTS) technologies has been essential for both fundamental and clinical studies. However, to what extent the genome sequence composition affects variant calling remains unclear. In this study, we identified 63,897 multi-copy sequences (MCSs) with a minimum length of 300 bp, each of which occurs at least twice in the human genome. The 151,749 genomic loci (multi-copy regions, or MCRs) harboring these MCSs account for 1.98% of the genome and are distributed unevenly across chromosomes. MCRs containing the same MCS tend to be located on the same chromosome. Gene Ontology (GO) analyses revealed that 3800 genes whose UTRs or exons overlap with MCRs are enriched for Golgi-related cellular component terms and various enzymatic activities in the GO biological function category. MCRs are also enriched for loci that are sensitive to neocarzinostatin-induced double-strand breaks. Moreover, genetic variants discovered by genome-wide association studies and recorded in dbSNP are significantly underrepresented in MCRs. Using simulated HTS datasets, we show that false variant discovery rates are significantly higher in MCRs than in other genomic regions. These results suggest that extra caution must be taken when identifying genetic variants in the MCRs via HTS technologies.



中文翻译:

多拷贝区域的生物学意义及其对变异发现的影响

通过高通量测序(HTS) 技术鉴定遗传变异对于基础研究和临床研究都至关重要。然而,基因组序列组成在多大程度上影响变异调用仍不清楚。在这项研究中,我们鉴定了 63,897 个最小长度为 300 bp 的多拷贝序列(MCS),每个序列在人类基因组中至少出现两次。包含这些 MCS的 151,749 个基因组位点(多拷贝区域,或 MCR)占基因组的 1.98%,并且在染色体上分布不均匀。含有相同 MCS 的 MCR 往往位于同一染色体上。基因本体论 (GO) 分析显示,3800 个其 UTR 或外显子与 MCR 重叠的基因在 GO 生物功能类别中富集了高尔基体相关细胞成分术语和各种酶活性。MCR 还富集了对新制癌菌素诱导的双链断裂敏感的位点。此外,通过全基因组关联研究发现并记录在 dbSNP 中的遗传变异在 MCR 中的代表性明显不足。使用模拟 HTS 数据集,我们发现 MCR 中的错​​误变异发现率明显高于其他基因组区域。这些结果表明,在通过 HTS 技术识别 MCR 中的遗传变异时必须格外小心。

更新日期:2020-08-19
down
wechat
bug