当前位置: X-MOL 学术Genome Res. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Predicting unrecognized enhancer-mediated genome topology by an ensemble machine learning model
Genome Research ( IF 7 ) Pub Date : 2020-12-01 , DOI: 10.1101/gr.264606.120
Li Tang 1, 2 , Matthew C Hill 3 , Jun Wang 4 , Jianxin Wang 1 , James F Martin 2, 3, 5, 6 , Min Li 1
Affiliation  

Transcriptional enhancers commonly work over long genomic distances to precisely regulate spatiotemporal gene expression patterns. Dissecting the promoters physically contacted by these distal regulatory elements is essential for understanding developmental processes as well as the role of disease-associated risk variants. Modern proximity-ligation assays, like HiChIP and ChIA-PET, facilitate the accurate identification of long-range contacts between enhancers and promoters. However, these assays are technically challenging, expensive, and time-consuming, making it difficult to investigate enhancer topologies, especially in uncharacterized cell types. To overcome these shortcomings, we therefore designed LoopPredictor, an ensemble machine learning model, to predict genome topology for cell types which lack long-range contact maps. To enrich for functional enhancer-promoter loops over common structural genomic contacts, we trained LoopPredictor with both H3K27ac and YY1 HiChIP data. Moreover, the integration of several related multi-omics features facilitated identifying and annotating the predicted loops. LoopPredictor is able to efficiently identify cell type–specific enhancer-mediated loops, and promoter–promoter interactions, with a modest feature input requirement. Comparable to experimentally generated H3K27ac HiChIP data, we found that LoopPredictor was able to identify functional enhancer loops. Furthermore, to explore the cross-species prediction capability of LoopPredictor, we fed mouse multi-omics features into a model trained on human data and found that the predicted enhancer loops outputs were highly conserved. LoopPredictor enables the dissection of cell type–specific long-range gene regulation and can accelerate the identification of distal disease-associated risk variants.

中文翻译:

通过集成机器学习模型预测无法识别的增强子介导的基因组拓扑

转录增强子通常在长基因组距离上工作,以精确调节时空基因表达模式。剖析这些远端调控元件物理接触的启动子对于理解发育过程以及疾病相关风险变异的作用至关重要。现代邻近连接分析,如 HiChIP 和 ChIA-PET,有助于准确识别增强子和启动子之间的远程接触。然而,这些检测在技术上具有挑战性、昂贵且耗时,因此很难研究增强子拓扑,尤其是在未表征的细胞类型中。为了克服这些缺点,我们因此设计了 LoopPredictor,一种集成机器学习模型,用于预测缺乏远程接触图的细胞类型的基因组拓扑。为了丰富常见结构基因组接触上的功能增强子-启动子环,我们使用 H3K27ac 和 YY1 HiChIP 数据训练 LoopPredictor。此外,几个相关的多组学特征的集成有助于识别和注释预测的循环。LoopPredictor 能够以适度的特征输入要求有效地识别细胞类型特异性增强子介导的环和启动子-启动子相互作用。与实验生成的 H3K27ac HiChIP 数据相比,我们发现 LoopPredictor 能够识别功能增强环。此外,为了探索 LoopPredictor 的跨物种预测能力,我们将小鼠多组学特征输入到基于人类数据训练的模型中,发现预测的增强子循环输出高度保守。
更新日期:2020-12-01
down
wechat
bug