The Design and Implementation of a Scalable DL Benchmarking Platform,arXiv - CS - General Literature

当前位置： X-MOL 学术 › arXiv.cs.GL › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

The Design and Implementation of a Scalable DL Benchmarking Platform
arXiv - CS - General Literature Pub Date : 2019-11-19 , DOI: arxiv-1911.08031
Cheng Li, Abdul Dakkak, Jinjun Xiong, Wen-mei Hwu

The current Deep Learning (DL) landscape is fast-paced and is rife with non-uniform models, hardware/software (HW/SW) stacks, but lacks a DL benchmarking platform to facilitate evaluation and comparison of DL innovations, be it models, frameworks, libraries, or hardware. Due to the lack of a benchmarking platform, the current practice of evaluating the benefits of proposed DL innovations is both arduous and error-prone - stifling the adoption of the innovations. In this work, we first identify $10$ design features which are desirable within a DL benchmarking platform. These features include: performing the evaluation in a consistent, reproducible, and scalable manner, being framework and hardware agnostic, supporting real-world benchmarking workloads, providing in-depth model execution inspection across the HW/SW stack levels, etc. We then propose MLModelScope, a DL benchmarking platform design that realizes the $10$ objectives. MLModelScope proposes a specification to define DL model evaluations and techniques to provision the evaluation workflow using the user-specified HW/SW stack. MLModelScope defines abstractions for frameworks and supports board range of DL models and evaluation scenarios. We implement MLModelScope as an open-source project with support for all major frameworks and hardware architectures. Through MLModelScope's evaluation and automated analysis workflows, we performed case-study analyses of $37$ models across $4$ systems and show how model, hardware, and framework selection affects model accuracy and performance under different benchmarking scenarios. We further demonstrated how MLModelScope's tracing capability gives a holistic view of model execution and helps pinpoint bottlenecks.

中文翻译：

可扩展深度学习基准平台的设计与实现

当前的深度学习 (DL) 格局是快节奏的，并且充斥着非统一模型、硬件/软件 (HW/SW) 堆栈，但缺乏一个 DL 基准测试平台来促进对 DL 创新的评估和比较，无论是模型、框架、库或硬件。由于缺乏基准测试平台，目前评估所提议的深度学习创新的好处的做法既艰巨又容易出错——阻碍了创新的采用。在这项工作中，我们首先确定了 10 美元的设计功能，这些功能在 DL 基准测试平台中是可取的。这些功能包括：以一致、可重复和可扩展的方式执行评估，与框架和硬件无关，支持真实世界的基准测试工作负载，提供跨硬件/软件堆栈级别的深入模型执行检查等。然后，我们提出了 MLModelScope，这是一种实现 10 美元目标的 DL 基准测试平台设计。MLModelScope 提出了一个规范来定义 DL 模型评估和技术，以使用用户指定的 HW/SW 堆栈来提供评估工作流。MLModelScope 定义了框架的抽象，并支持 DL 模型和评估场景的板范围。我们将 MLModelScope 实施为一个开源项目，支持所有主要框架和硬件架构。通过 MLModelScope 的评估和自动化分析工作流程，我们对 4 美元系统中的 37 美元模型进行了案例研究分析，并展示了模型、硬件和框架选择如何影响不同基准测试场景下的模型准确性和性能。我们进一步展示了 MLModelScope'

更新日期：2019-11-20

点击分享查看原文

点击收藏

阅读更多本刊最新论文