fastISM: Performant in-silico saturation mutagenesis for convolutional neural networks,bioRxiv - Bioinformatics

当前位置： X-MOL 学术 › bioRxiv. Bioinform. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

fastISM: Performant in-silico saturation mutagenesis for convolutional neural networks
bioRxiv - Bioinformatics Pub Date : 2020-10-13 , DOI: 10.1101/2020.10.13.337147
Surag Nair , Avanti Shrikumar , Anshul Kundaje

Deep learning models such as convolutional neural networks are able to accurately map biological sequences to associated functional readouts and properties by learning predictive de novo representations. In-silico saturation mutagenesis (ISM) is a popular feature attribution technique for inferring contributions of all characters in an input sequence to the model's predicted output. The main drawback of ISM is its runtime, as it involves multiple forward propagations of all possible mutations of each character in the input sequence through the trained model to predict the effects on the output. We present fastISM, an algorithm that speeds up ISM by a factor of over 10x for commonly used convolutional neural network architectures. fastISM is based on the observations that the majority of computation in ISM is spent in convolutional layers, and a single mutation only disrupts a limited region of intermediate layers, rendering most computation redundant. fastISM reduces the gap between backpropagation-based feature attribution methods and ISM. It far surpasses the runtime of backpropagation-based methods on multi-output architectures, making it feasible to run ISM on a large number of sequences. An easy-to-use Keras/TensorFlow 2 implementation of fastISM is available at https://github.com/kundajelab/fastISM, and a hands-on tutorial at https://colab.research.google.com/github/kundajelab/fastISM/blob/master/notebooks/colab/DeepSEA.ipynb.

中文翻译：

fastISM：用于卷积神经网络的高性能计算机内饱和诱变

诸如卷积神经网络之类的深度学习模型能够通过学习预测性从头表示来将生物学序列准确地映射到相关的功能读数和特性。电子计算机内饱和诱变（ISM）是一种流行的功能归因技术，用于推断输入序列中所有字符对模型的预测输出的贡献。ISM的主要缺点是它的运行时，因为它涉及到经过训练的模型来预测输入结果对输入序列中每个字符的所有可能突变的多次正向传播。我们介绍了fastISM，这是一种针对常用卷积神经网络体系结构将ISM速度提高10倍以上的算法。fastISM基于以下观察结果：ISM中的大部分计算都用在卷积层中，一个突变只会破坏中间层的有限区域，从而使大多数计算变得多余。fastISM缩小了基于反向传播的特征归因方法与ISM之间的差距。它远远超过了多输出体系结构上基于反向传播的方法的运行时间，这使得在大量序列上运行ISM变得可行。https://github.com/kundajelab/fastISM上提供了易于使用的Keras / TensorFlow 2实现，以及https://colab.research.google.com/github/kundajelab上的动手教程。 /fastISM/blob/master/notebooks/colab/DeepSEA.ipynb。它远远超过了多输出体系结构上基于反向传播的方法的运行时间，这使得在大量序列上运行ISM变得可行。https://github.com/kundajelab/fastISM上提供了易于使用的Keras / TensorFlow 2实现，以及https://colab.research.google.com/github/kundajelab上的动手教程。 /fastISM/blob/master/notebooks/colab/DeepSEA.ipynb。它远远超过了多输出体系结构上基于反向传播的方法的运行时间，这使得在大量序列上运行ISM变得可行。https://github.com/kundajelab/fastISM上提供了易于使用的Keras / TensorFlow 2实现，以及https://colab.research.google.com/github/kundajelab上的动手教程。 /fastISM/blob/master/notebooks/colab/DeepSEA.ipynb。

更新日期：2020-10-15

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文