当前位置: X-MOL 学术Genome Res. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Unsupervised contrastive peak caller for ATAC-seq
Genome Research ( IF 7 ) Pub Date : 2023-07-01 , DOI: 10.1101/gr.277677.123
Ha T H Vu 1, 2 , Yudi Zhang 3 , Geetu Tuteja 1, 2 , Karin S Dorman 2, 3, 4
Affiliation  

The assay for transposase-accessible chromatin with sequencing (ATAC-seq) is a common assay to identify chromatin accessible regions by using a Tn5 transposase that can access, cut, and ligate adapters to DNA fragments for subsequent amplification and sequencing. These sequenced regions are quantified and tested for enrichment in a process referred to as “peak calling.” Most unsupervised peak calling methods are based on simple statistical models and suffer from elevated false positive rates. Newly developed supervised deep learning methods can be successful, but they rely on high quality labeled data for training, which can be difficult to obtain. Moreover, though biological replicates are recognized to be important, there are no established approaches for using replicates in the deep learning tools, and the approaches available for traditional methods either cannot be applied to ATAC-seq, where control samples may be unavailable, or are post hoc and do not capitalize on potentially complex, but reproducible signal in the read enrichment data. Here, we propose a novel peak caller that uses unsupervised contrastive learning to extract shared signals from multiple replicates. Raw coverage data are encoded to obtain low-dimensional embeddings and optimized to minimize a contrastive loss over biological replicates. These embeddings are passed to another contrastive loss for learning and predicting peaks and decoded to denoised data under an autoencoder loss. We compared our replicative contrastive learner (RCL) method with other existing methods on ATAC-seq data, using annotations from ChromHMM genomic labels and transcription factor ChIP-seq as noisy truth. RCL consistently achieved the best performance.

中文翻译:

用于 ATAC-seq 的无监督对比峰值识别器

转座酶可及染色质测序分析 (ATAC-seq) 是一种常见的分析方法,通过使用 Tn5 转座酶来识别染色质可及区域,Tn5 转座酶可以访问、切割和连接接头至 DNA 片段,以进行后续扩增和测序。这些测序区域在称为“峰识别”的过程中进行量化和富集测试。大多数无监督峰值检测方法都基于简单的统计模型,并且误报率较高。新开发的监督深度学习方法可以取得成功,但它们依赖于高质量的标记数据进行训练,而这些数据可能很难获得。此外,尽管生物重复被认为很重要,但在深度学习工具中还没有使用重复的既定方法,并且传统方法可用的方法要么不能应用于 ATAC-seq,因为控制样本可能不可用,要么无法应用于 ATAC-seq。事后分析,不利用读取富集数据中潜在复杂但可重复的信号。在这里,我们提出了一种新颖的峰值调用器,它使用无监督对比学习从多个重复中提取共享信号。原始覆盖数据经过编码以获得低维嵌入,并进行优化以最大限度地减少生物复制的对比损失。这些嵌入被传递到另一个对比损失以学习和预测峰值,并在自动编码器损失下解码为去噪数据。我们使用 ChromHMM 基因组标签和转录因子 ChIP-seq 的注释作为噪声事实,将我们的复制对比学习器 (RCL) 方法与 ATAC-seq 数据上的其他现有方法进行比较。RCL 始终取得最佳性能。
更新日期:2023-07-01
down
wechat
bug