Improving representations of genomic sequence motifs in convolutional networks with exponential activations,Nature Machine Intelligence

当前位置： X-MOL 学术 › Nat. Mach. Intell. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Improving representations of genomic sequence motifs in convolutional networks with exponential activations
Nature Machine Intelligence ( IF 18.8 ) Pub Date : 2021-02-08 , DOI: 10.1038/s42256-020-00291-x
Peter K Koo ₁ , Matt Ploenzke ₂

Affiliation

Deep convolutional neural networks (CNNs) trained on regulatory genomic sequences tend to build representations in a distributed manner, making it a challenge to extract learned features that are biologically meaningful, such as sequence motifs. Here we perform a comprehensive analysis of synthetic sequences to investigate the role that CNN activations have on model interpretability. We show that employing an exponential activation in the first layer filters consistently leads to interpretable and robust representations of motifs compared with other commonly used activations. Strikingly, we demonstrate that CNNs with better test performance do not necessarily imply more interpretable representations with attribution methods. We find that CNNs with exponential activations significantly improve the efficacy of recovering biologically meaningful representations with attribution methods. We demonstrate that these results generalize to real DNA sequences across several in vivo datasets. Together, this work demonstrates how a small modification to existing CNNs (that is, setting exponential activations in the first layer) can substantially improve the robustness and interpretabilty of learned representations directly in convolutional filters and indirectly with attribution methods.

中文翻译：

使用指数激活改进卷积网络中基因组序列基序的表示

在调控基因组序列上训练的深度卷积神经网络 (CNN) 倾向于以分布式方式构建表示，这使得提取具有生物学意义的学习特征（例如序列基序）成为一项挑战。在这里，我们对合成序列进行了全面分析，以研究 CNN 激活对模型可解释性的作用。我们表明，与其他常用激活相比，在第一层过滤器中使用指数激活始终导致基序的可解释和鲁棒表示。引人注目的是，我们证明了具有更好测试性能的 CNN 并不一定意味着具有更多可解释性的归因方法表示。我们发现具有指数激活的 CNN 显着提高了使用归因方法恢复具有生物学意义的表征的效率。我们证明这些结果可以推广到多个体内数据集的真实 DNA 序列。总之，这项工作证明了对现有 CNN 的小修改（即在第一层中设置指数激活）如何可以显着提高直接在卷积滤波器中和间接使用归因方法的学习表示的鲁棒性和可解释性。

更新日期：2021-02-08

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文