当前位置: X-MOL 学术Genome Biol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
An explainable artificial intelligence approach for decoding the enhancer histone modifications code and identification of novel enhancers in Drosophila
Genome Biology ( IF 10.1 ) Pub Date : 2021-11-08 , DOI: 10.1186/s13059-021-02532-7
Jareth C Wolfe 1, 2, 3 , Liudmila A Mikheeva 1, 3, 4 , Hani Hagras 2 , Nicolae Radu Zabet 1, 3
Affiliation  

Enhancers are non-coding regions of the genome that control the activity of target genes. Recent efforts to identify active enhancers experimentally and in silico have proven effective. While these tools can predict the locations of enhancers with a high degree of accuracy, the mechanisms underpinning the activity of enhancers are often unclear. Using machine learning (ML) and a rule-based explainable artificial intelligence (XAI) model, we demonstrate that we can predict the location of known enhancers in Drosophila with a high degree of accuracy. Most importantly, we use the rules of the XAI model to provide insight into the underlying combinatorial histone modifications code of enhancers. In addition, we identified a large set of putative enhancers that display the same epigenetic signature as enhancers identified experimentally. These putative enhancers are enriched in nascent transcription, divergent transcription and have 3D contacts with promoters of transcribed genes. However, they display only intermediary enrichment of mediator and cohesin complexes compared to previously characterised active enhancers. We also found that 10–15% of the predicted enhancers display similar characteristics to super enhancers observed in other species. Here, we applied an explainable AI model to predict enhancers with high accuracy. Most importantly, we identified that different combinations of epigenetic marks characterise different groups of enhancers. Finally, we discovered a large set of putative enhancers which display similar characteristics with previously characterised active enhancers.

中文翻译:


一种可解释的人工智能方法,用于解码果蝇增强子组蛋白修饰代码和识别新型增强子



增强子是控制靶基因活性的基因组非编码区域。最近通过实验和计算机识别活性增强剂的努力已被证明是有效的。虽然这些工具可以高度准确地预测增强子的位置,但支撑增强子活性的机制通常尚不清楚。使用机器学习(ML)和基于规则的可解释人工智能(XAI)模型,我们证明我们可以高精度预测果蝇中已知增强子的位置。最重要的是,我们使用 XAI 模型的规则来深入了解增强子的底层组合组蛋白修饰代码。此外,我们还鉴定了一大组假定的增强子,它们显示出与实验鉴定的增强子相同的表观遗传特征。这些假定的增强子在新生转录、趋异转录中富集,并与转录基因的启动子有3D接触。然而,与之前表征的活性增强剂相比,它们仅表现出介质和粘连蛋白复合物的中间富集。我们还发现 10-15% 的预测增强子表现出与在其他物种中观察到的超级增强子相似的特征。在这里,我们应用了一个可解释的人工智能模型来高精度地预测增强子。最重要的是,我们发现表观遗传标记的不同组合表征了不同组的增强子。最后,我们发现了大量假定的增强子,它们表现出与先前表征的活性增强子相似的特征。
更新日期:2021-11-08
down
wechat
bug