当前位置: X-MOL 学术Brief. Bioinform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Leveraging the attention mechanism to improve the identification of DNA N6-methyladenine sites
Briefings in Bioinformatics ( IF 9.5 ) Pub Date : 2021-08-09 , DOI: 10.1093/bib/bbab351
Ying Zhang 1 , Yan Liu 1 , Jian Xu 2 , Xiaoyu Wang 3 , Xinxin Peng 3 , Jiangning Song 3 , Dong-Jun Yu 2
Affiliation  

DNA N6-methyladenine is an important type of DNA modification that plays important roles in multiple biological processes. Despite the recent progress in developing DNA 6mA site prediction methods, several challenges remain to be addressed. For example, although the hand-crafted features are interpretable, they contain redundant information that may bias the model training and have a negative impact on the trained model. Furthermore, although deep learning (DL)-based models can perform feature extraction and classification automatically, they lack the interpretability of the crucial features learned by those models. As such, considerable research efforts have been focused on achieving the trade-off between the interpretability and straightforwardness of DL neural networks. In this study, we develop two new DL-based models for improving the prediction of N6-methyladenine sites, termed LA6mA and AL6mA, which use bidirectional long short-term memory to respectively capture the long-range information and self-attention mechanism to extract the key position information from DNA sequences. The performance of the two proposed methods is benchmarked and evaluated on the two model organisms Arabidopsis thaliana and Drosophila melanogaster. On the two benchmark datasets, LA6mA achieves an area under the receiver operating characteristic curve (AUROC) value of 0.962 and 0.966, whereas AL6mA achieves an AUROC value of 0.945 and 0.941, respectively. Moreover, an in-depth analysis of the attention matrix is conducted to interpret the important information, which is hidden in the sequence and relevant for 6mA site prediction. The two novel pipelines developed for DNA 6mA site prediction in this work will facilitate a better understanding of the underlying principle of DL-based DNA methylation site prediction and its future applications.

中文翻译:

利用注意力机制改进 DNA N6-甲基腺嘌呤位点的识别

DNA N6-甲基腺嘌呤是一种重要的DNA修饰类型,在多种生物过程中发挥着重要作用。尽管最近在开发 DNA 6mA 位点预测方法方面取得了进展,但仍有一些挑战有待解决。例如,虽然手工制作的特征是可解释的,但它们包含冗余信息,可能会使模型训练产生偏差并对训练模型产生负面影响。此外,尽管基于深度学习 (DL) 的模型可以自动执行特征提取和分类,但它们缺乏对这些模型学习的关键特征的可解释性。因此,相当多的研究工作集中在实现 DL 神经网络的可解释性和直接性之间的权衡。在这项研究中,我们开发了两个新的基于 DL 的模型来改进 N6-甲基腺嘌呤位点的预测,称为 LA6mA 和 AL6mA,它们使用双向长短期记忆分别捕获远程信息和自我注意机制来提取关键位置信息从 DNA 序列。两种建议方法的性能在两种模式生物拟南芥和黑腹果蝇上进行了基准测试和评估。在两个基准数据集上,LA6mA 的 AUROC 值分别为 0.962 和 0.966,而 AL6mA 的 AUROC 值分别为 0.945 和 0.941。此外,对注意力矩阵进行了深入分析,以解释隐藏在序列中且与 6mA 站点预测相关的重要信息。
更新日期:2021-08-09
down
wechat
bug