Incremental and Approximate Computations for Accelerating Deep CNN Inference,ACM Transactions on Database Systems

当前位置： X-MOL 学术 › ACM Trans. Database Syst. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Incremental and Approximate Computations for Accelerating Deep CNN Inference
ACM Transactions on Database Systems ( IF 2.2 ) Pub Date : 2020-07-07 , DOI: 10.1145/3397461
Supun Chathuranga Nakandala ₁ , Kabir Nagrecha ₁ , Arun Kumar , Yannis Papakonstantinou ₁

Affiliation

Deep learning now offers state-of-the-art accuracy for many prediction tasks. A form of deep learning called deep convolutional neural networks (CNNs) are especially popular on image, video, and time series data. Due to its high computational cost, CNN inference is often a bottleneck in analytics tasks on such data. Thus, a lot of work in the computer architecture, systems, and compilers communities study how to make CNN inference faster. In this work, we show that by elevating the abstraction level and re-imagining CNN inference as queries , we can bring to bear database-style query optimization techniques to improve CNN inference efficiency. We focus on tasks that perform CNN inference repeatedly on inputs that are only slightly different . We identify two popular CNN tasks with this behavior: occlusion-based explanations (OBE) and object recognition in videos (ORV). OBE is a popular method for “explaining” CNN predictions. It outputs a heatmap over the input to show which regions (e.g., image pixels) mattered most for a given prediction. It leads to many re-inference requests on locally modified inputs. ORV uses CNNs to identify and track objects across video frames. It also leads to many re-inference requests. We cast such tasks in a unified manner as a novel instance of the incremental view maintenance problem and create a comprehensive algebraic framework for incremental CNN inference that reduces computational costs. We produce materialized views of features produced inside a CNN and connect them with a novel multi-query optimization scheme for CNN re-inference. Finally, we also devise novel OBE-specific and ORV-specific approximate inference optimizations exploiting their semantics. We prototype our ideas in Python to create a tool called Krypton that supports both CPUs and GPUs. Experiments with real data and CNNs show that Krypton reduces runtimes by up to 5× (respectively, 35×) to produce exact (respectively, high-quality approximate) results without raising resource requirements.

中文翻译：

用于加速深度 CNN 推理的增量和近似计算

深度学习现在为许多预测任务提供了最先进的准确性。一种称为深度卷积神经网络 (CNN) 的深度学习形式在图像、视频和时间序列数据中特别流行。由于计算成本高，CNN 推理通常是此类数据分析任务的瓶颈。因此，计算机架构、系统和编译器社区的大量工作都在研究如何使 CNN 推理更快。在这项工作中，我们通过提升抽象级别并将 CNN 推理重新想象为查询，我们可以使用数据库式的查询优化技术来提高 CNN 的推理效率。我们专注于执行 CNN 推理的任务反复在仅输入稍微不一样. 我们确定了两个具有这种行为的流行 CNN 任务：基于遮挡的解释(OBE) 和视频中的物体识别（ORV）。OBE 是一种流行的“解释”CNN 预测的方法。它在输入上输出一个热图，以显示哪些区域（例如，图像像素）对于给定的预测最重要。它会导致对本地修改的输入进行许多重新推理请求。ORV 使用 CNN 来识别和跟踪视频帧中的对象。它还导致许多重新推理请求。我们以统一的方式将这些任务作为一个新的实例增量视图维护问题并为增量 CNN 推理创建一个全面的代数框架，以降低计算成本。我们生产物化视图CNN内部产生的特征并将它们与小说联系起来多查询优化CNN 重新推理的方案。最后，我们还利用其语义设计了新颖的 OBE 特定和 ORV 特定的近似推理优化。我们用 Python 对我们的想法进行原型设计，以创建一个名为氪同时支持 CPU 和 GPU。真实数据和 CNN 的实验表明氪将运行时间减少多达 5 倍（分别为 35 倍），以在不增加资源需求的情况下产生精确（分别为高质量近似）结果。

更新日期：2020-07-07

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11