当前位置: X-MOL 学术Digit. Signal Process. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Detection and inference of interspersed duplicated insertions from paired-end reads
Digital Signal Processing ( IF 2.9 ) Pub Date : 2021-01-07 , DOI: 10.1016/j.dsp.2020.102959
Xiguo Yuan , Wenlu Xie , Hongzhi Yang , Jun Bai , Ruwu Yang , Guojun Liu , Haque A.K. Alvi

Interspersed duplicated insertion (idINS) is a common type of genomic insertion and plays an important role in genomic instability in cancer genesis. Nevertheless, the detection of such type of insertions is challenging, since the reads originated from idINS regions in the donor sample are most likely to be mapped perfectly to other regions in the reference. Most of the existing approaches adopt paired-end mapping to detect idINSs, but the characterization of idINSs larger than the mean insert size is still challenging due to the short sequencing reads. Therefore, there is still a need for practical algorithms to detect and infer idINSs regardless of their lengths. Here, we present a new algorithm, called DIPins, which can accurately detect and infer idINSs contents from paired-end reads. DIPins is capable of detecting breakpoint positions and inferring the contents of idINSs even when the length of variation exceeds the paired-end insert size. The major principle of DIPins is that it extracts multiple signatures from split reads and integrates them to determine idINS positions and adopts a dynamic process to construct idINS contents by iteratively generating unobserved split reads from the restricted area around the idINS breakpoint. We test the performance of DIPins on both simulation and real data. The results demonstrate its advantages over other methods and its potential application prospects in the accurate characterization of idINSs in human genome.



中文翻译:

从配对末端读取中检测和推断散布的重复插入

散布的重复插入(idINS)是基因组插入的一种常见类型,并且在癌症发生中的基因组不稳定性中起重要作用。尽管如此,由于来自供体样品中idINS区域的读数最有可能被完美地映射到参考中的其他区域,因此检测此类插入具有挑战性。大多数现有方法都采用配对末端映射来检测idINS,但是由于测序读段短,大于平均插入片段大小的idINS的表征仍然具有挑战性。因此,仍然需要实用的算法来检测和推断idINS,无论它们的长度如何。在这里,我们提出了一种称为DIPins的新算法,该算法可以从配对末端读取中准确检测和推断idINSs的内容。DIPins能够检测断点位置并推断idINS的内容,即使变异长度超过配对末端插入片段的大小也是如此。DIPins的主要原理是,它从拆分读取中提取多个签名并将其集成以确定idINS位置,并采用动态过程通过从idINS断点周围的受限区域迭代生成未观察到的拆分读取来构造idINS内容。我们在模拟和真实数据上测试DIPins的性能。结果证明了其在人类基因组中idINS的准确表征方面优于其他方法的优势及其潜在的应用前景。DIPins的主要原理是,它从拆分读取中提取多个签名并将其集成以确定idINS位置,并采用动态过程通过从idINS断点周围的受限区域迭代生成未观察到的拆分读取来构造idINS内容。我们在模拟和真实数据上测试DIPins的性能。结果证明了其在人类基因组中idINS的准确表征方面优于其他方法的优势及其潜在的应用前景。DIPins的主要原理是,它从拆分读取中提取多个签名并将其集成以确定idINS位置,并采用动态过程通过从idINS断点周围的受限区域迭代生成未观察到的拆分读取来构造idINS内容。我们在模拟和真实数据上测试DIPins的性能。结果证明了其在人类基因组中idINS的准确表征方面优于其他方法的优势及其潜在的应用前景。

更新日期:2021-01-25
down
wechat
bug