DLBricks: Composable Benchmark Generation to Reduce Deep Learning Benchmarking Effort on CPUs (Extended),arXiv - CS - Performance

当前位置： X-MOL 学术 › arXiv.cs.PF › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

DLBricks: Composable Benchmark Generation to Reduce Deep Learning Benchmarking Effort on CPUs (Extended)
arXiv - CS - Performance Pub Date : 2019-11-18 , DOI: arxiv-1911.07967
Cheng Li, Abdul Dakkak, Jinjun Xiong, Wen-mei Hwu

The past few years have seen a surge of applying Deep Learning (DL) models for a wide array of tasks such as image classification, object detection, machine translation, etc. While DL models provide an opportunity to solve otherwise intractable tasks, their adoption relies on them being optimized to meet latency and resource requirements. Benchmarking is a key step in this process but has been hampered in part due to the lack of representative and up-to-date benchmarking suites. This is exacerbated by the fast-evolving pace of DL models. This paper proposes DLBricks, a composable benchmark generation design that reduces the effort of developing, maintaining, and running DL benchmarks on CPUs. DLBricks decomposes DL models into a set of unique runnable networks and constructs the original model's performance using the performance of the generated benchmarks. DLBricks leverages two key observations: DL layers are the performance building blocks of DL models and layers are extensively repeated within and across DL models. Since benchmarks are generated automatically and the benchmarking time is minimized, DLBricks can keep up-to-date with the latest proposed models, relieving the pressure of selecting representative DL models. Moreover, DLBricks allows users to represent proprietary models within benchmark suites. We evaluate DLBricks using $50$ MXNet models spanning $5$ DL tasks on $4$ representative CPU systems. We show that DLBricks provides an accurate performance estimate for the DL models and reduces the benchmarking time across systems (e.g. within $95\%$ accuracy and up to $4.4\times$ benchmarking time speedup on Amazon EC2 c5.xlarge).

中文翻译：

DLBricks：可组合基准生成以减少 CPU 上的深度学习基准测试工作（扩展）

在过去几年中，将深度学习 (DL) 模型应用于图像分类、对象检测、机器翻译等一系列任务的情况激增。虽然 DL 模型提供了解决其他棘手任务的机会，但它们的采用依赖于对它们进行优化以满足延迟和资源要求。基准测试是此过程中的关键步骤，但由于缺乏代表性和最新的基准测试套件而受到阻碍。DL 模型的快速发展加剧了这种情况。本文提出了 DLBricks，这是一种可组合的基准生成设计，可减少在 CPU 上开发、维护和运行 DL 基准的工作。DLBricks 将 DL 模型分解为一组独特的可运行网络并构建原始模型 s 性能使用生成的基准的性能。DLBricks 利用两个关键观察结果：DL 层是 DL 模型的性能构建块，并且层在 DL 模型内和跨 DL 模型广泛重复。由于基准是自动生成的，并且基准时间被最小化，DLBricks 可以与最新提出的模型保持同步，减轻选择具有代表性的 DL 模型的压力。此外，DLBricks 允许用户在基准套件中表示专有模型。我们使用 50 美元的 MXNet 模型评估 DLBricks，该模型跨越 4 美元代表性 CPU 系统上的 5 美元 DL 任务。我们表明，DLBricks 为 DL 模型提供了准确的性能估计，并减少了跨系统的基准测试时间（例如，在 Amazon EC2 c5 上的基准测试时间加速高达 95\%$ 以内和高达 $4.4\times$ 的基准测试时间。

更新日期：2020-03-12

点击分享查看原文

点击收藏

阅读更多本刊最新论文