当前位置: X-MOL 学术Stat. Appl. Genet. Molecul. Biol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
An Empirical Bayes approach for the identification of long-range chromosomal interaction from Hi-C data
Statistical Applications in Genetics and Molecular Biology ( IF 0.9 ) Pub Date : 2021-02-01 , DOI: 10.1515/sagmb-2020-0026
Qi Zhang 1 , Zheng Xu 2 , Yutong Lai 3
Affiliation  

Hi-C experiments have become very popular for studying the 3D genome structure in recent years. Identification of long-range chromosomal interaction, i.e., peak detection, is crucial for Hi-C data analysis. But it remains a challenging task due to the inherent high dimensionality, sparsity and the over-dispersion of the Hi-C count data matrix. We propose EBHiC, an empirical Bayes approach for peak detection from Hi-C data. The proposed framework provides flexible over-dispersion modeling by explicitly including the “true” interaction intensities as latent variables. To implement the proposed peak identification method (via the empirical Bayes test), we estimate the overall distributions of the observed counts semiparametrically using a Smoothed Expectation Maximization algorithm, and the empirical null based on the zero assumption . We conducted extensive simulations to validate and evaluate the performance of our proposed approach and applied it to real datasets. Our results suggest that EBHiC can identify better peaks in terms of accuracy, biological interpretability, and the consistency across biological replicates. The source code is available on Github (https://github.com/QiZhangStat/EBHiC).

中文翻译:

从 Hi-C 数据中识别长程染色体相互作用的经验贝叶斯方法

近年来,Hi-C 实验在研究 3D 基因组结构方面变得非常流行。鉴定长程染色体相互作用,即峰检测,对于 Hi-C 数据分析至关重要。但由于 Hi-C 计数数据矩阵固有的高维性、稀疏性和过度分散性,它仍然是一项具有挑战性的任务。我们提出了 EBHiC,这是一种经验贝叶斯方法,用于从 Hi-C 数据中检测峰值。所提出的框架通过明确地将“真实”交互强度作为潜在变量包括在内,提供了灵活的过度分散建模。为了实现所提出的峰值识别方法(通过经验贝叶斯检验),我们使用平滑期望最大化算法以半参数方式估计观察计数的总体分布,并基于零假设估计经验零值。我们进行了广泛的模拟来验证和评估我们提出的方法的性能,并将其应用于真实的数据集。我们的结果表明,EBHiC 可以在准确性、生物学可解释性和生物学重复的一致性方面识别出更好的峰。源代码在 Github (https://github.com/QiZhangStat/EBHiC) 上可用。
更新日期:2021-02-01
down
wechat
bug