当前位置: X-MOL 学术bioRxiv. Bioinform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Enhancer prediction in the human genome by probabilistic modelling of the chromatin feature patterns
bioRxiv - Bioinformatics Pub Date : 2020-06-02 , DOI: 10.1101/804625
Maria Osmala , Harri Lähdesmäki

Background: The binding sites of transcription factors (TFs) and the localisation of histone modifications in the human genome can be quantified by the chromatin immunoprecipitation assay coupled with next-generation sequencing (ChIP-seq). The resulting chromatin feature data has been successfully adopted for genome-wide enhancer identification by several unsupervised and supervised machine learning methods. However, the current methods predict different numbers and different sets of enhancers for the same cell type and do not utilise the pattern of the ChIP-seq coverage profiles efficiently. Results: In this work, we propose a PRobabilistic Enhancer PRedictIoN Tool (PREPRINT) that assumes characteristic coverage patterns of chromatin features at enhancers and employs a statistical model to account for their variability. PREPRINT defines probabilistic distance measures to quantify the similarity of the genomic query regions and the characteristic coverage patterns. %The distance measures, i.e., scores are computed either applying the maximum likelihood (ML) approach or the Bayesian approach. The probabilistic scores of the enhancer and non-enhancer samples are utilised to train a kernel-based classifier. The performance of the method is demonstrated on ENCODE data for two cell lines. The predicted enhancers are computationally validated based on the transcriptional regulatory protein binding sites and compared to the predictions obtained by state-of-the-art methods. Conclusion: PREPRINT performs favourably to the state-of-the-art methods, especially when requiring the methods to predict a larger set of enhancers. PREPRINT generalises successfully to data from cell type not utilised for training, and often the PREPRINT performs better than the previous methods. The PREPRINT enhancers are less sensitive to the choice of prediction threshold. PREPRINT identifies biologically validated enhancers not predicted by the competing methods. The enhancers predicted by PREPRINT can aid the genome interpretation in functional genomics and clinical studies.

中文翻译:

通过染色质特征模式的概率建模在人类基因组中进行增强子预测

背景:转录因子(TFs)的结合位点和组蛋白修饰在人类基因组中的定位可以通过染色质免疫沉淀测定法与下一代测序(ChIP-seq)结合进行定量。通过几种无监督和有监督的机器学习方法,所得的染色质特征数据已成功地用于全基因组增强子的鉴定。然而,当前的方法针对同一细胞类型预测不同数量的增强子和不同的增强子集,并且没有有效利用ChIP-seq覆盖图谱的模式。结果:在这项工作中,我们提出了一种功能增强的增强子预测工具(PREPRINT),该工具假定增强子上染色质特征的特征覆盖模式,并采用统计模型来说明其可变性。PREPRINT定义了概率距离度量,以量化基因组查询区域和特征覆盖模式的相似性。距离度量,即分数是使用最大似然(ML)方法或贝叶斯方法计算的。增强子和非增强子样本的概率分数用于训练基于核的分类器。在两个细胞系的ENCODE数据上证明了该方法的性能。预测的增强子是基于转录调节蛋白结合位点进行计算验证的,并与通过最新技术方法获得的预测进行比较。结论:PREPRINT相对于最新方法表现良好,尤其是在要求该方法预测更多增强子的方法时。PREPRINT成功地概括了未用于训练的单元格类型的数据,通常PREPRINT的性能优于以前的方法。PREPRINT增强子对预测阈值的选择不太敏感。PREPRINT可以识别竞争方法无法预测的经过生物验证的增强剂。PREPRINT预测的增强子可以在功能基因组学和临床研究中帮助基因组解释。
更新日期:2020-06-02
down
wechat
bug