当前位置: X-MOL 学术Nat. Mach. Intell. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Deep neural networks identify sequence context features predictive of transcription factor binding
Nature Machine Intelligence ( IF 18.8 ) Pub Date : 2021-01-18 , DOI: 10.1038/s42256-020-00282-y
An Zheng 1 , Michael Lamkin 2 , Hanqing Zhao 3 , Cynthia Wu 4 , Hao Su 1 , Melissa Gymrek 1, 5
Affiliation  

Transcription factors bind DNA by recognizing specific sequence motifs, which are typically 6–12 bp long. A motif can occur many thousands of times in the human genome, but only a subset of those sites are actually bound. Here we present a machine-learning framework leveraging existing convolutional neural network architectures and model interpretation techniques to identify and interpret sequence context features most important for predicting whether a particular motif instance will be bound. We apply our framework to predict binding at motifs for 38 transcription factors in a lymphoblastoid cell line, score the importance of context sequences at base-pair resolution and characterize context features most predictive of binding. We find that the choice of training data heavily influences classification accuracy and the relative importance of features such as open chromatin. Overall, our framework enables novel insights into features predictive of transcription factor binding and is likely to inform future deep learning applications to interpret non-coding genetic variants.



中文翻译:


深度神经网络识别预测转录因子结合的序列上下文特征



转录因子通过识别特定的序列基序(通常为 6-12 bp 长)来结合 DNA。一个基序可以在人类基因组中出现数千次,但实际上只有这些位点的一个子集被结合。在这里,我们提出了一个机器学习框架,利用现有的卷积神经网络架构和模型解释技术来识别和解释对于预测特定主题实例是否会被绑定最重要的序列上下文特征。我们应用我们的框架来预测类淋巴母细胞系中 38 个转录因子的基序结合,对碱基对分辨率的上下文序列的重要性进行评分,并表征最能预测结合的上下文特征。我们发现训练数据的选择严重影响分类准确性和开放染色质等特征的相对重要性。总体而言,我们的框架能够对预测转录因子结合的特征提供新的见解,并可能为未来的深度学习应用程序解释非编码遗传变异提供信息。

更新日期:2021-01-18
down
wechat
bug