当前位置: X-MOL 学术arXiv.cs.GL › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
The Design and Implementation of a Scalable DL Benchmarking Platform
arXiv - CS - General Literature Pub Date : 2019-11-19 , DOI: arxiv-1911.08031
Cheng Li, Abdul Dakkak, Jinjun Xiong, Wen-mei Hwu

The current Deep Learning (DL) landscape is fast-paced and is rife with non-uniform models, hardware/software (HW/SW) stacks, but lacks a DL benchmarking platform to facilitate evaluation and comparison of DL innovations, be it models, frameworks, libraries, or hardware. Due to the lack of a benchmarking platform, the current practice of evaluating the benefits of proposed DL innovations is both arduous and error-prone - stifling the adoption of the innovations. In this work, we first identify $10$ design features which are desirable within a DL benchmarking platform. These features include: performing the evaluation in a consistent, reproducible, and scalable manner, being framework and hardware agnostic, supporting real-world benchmarking workloads, providing in-depth model execution inspection across the HW/SW stack levels, etc. We then propose MLModelScope, a DL benchmarking platform design that realizes the $10$ objectives. MLModelScope proposes a specification to define DL model evaluations and techniques to provision the evaluation workflow using the user-specified HW/SW stack. MLModelScope defines abstractions for frameworks and supports board range of DL models and evaluation scenarios. We implement MLModelScope as an open-source project with support for all major frameworks and hardware architectures. Through MLModelScope's evaluation and automated analysis workflows, we performed case-study analyses of $37$ models across $4$ systems and show how model, hardware, and framework selection affects model accuracy and performance under different benchmarking scenarios. We further demonstrated how MLModelScope's tracing capability gives a holistic view of model execution and helps pinpoint bottlenecks.

中文翻译:

可扩展深度学习基准平台的设计与实现

当前的深度学习 (DL) 格局是快节奏的,并且充斥着非统一模型、硬件/软件 (HW/SW) 堆栈,但缺乏一个 DL 基准测试平台来促进对 DL 创新的评估和比较,无论是模型、框架、库或硬件。由于缺乏基准测试平台,目前评估所提议的深度学习创新的好处的做法既艰巨又容易出错——阻碍了创新的采用。在这项工作中,我们首先确定了 10 美元的设计功能,这些功能在 DL 基准测试平台中是可取的。这些功能包括:以一致、可重复和可扩展的方式执行评估,与框架和硬件无关,支持真实世界的基准测试工作负载,提供跨硬件/软件堆栈级别的深入模型执行检查等。然后,我们提出了 MLModelScope,这是一种实现 10 美元目标的 DL 基准测试平台设计。MLModelScope 提出了一个规范来定义 DL 模型评估和技术,以使用用户指定的 HW/SW 堆栈来提供评估工作流。MLModelScope 定义了框架的抽象,并支持 DL 模型和评估场景的板范围。我们将 MLModelScope 实施为一个开源项目,支持所有主要框架和硬件架构。通过 MLModelScope 的评估和自动化分析工作流程,我们对 4 美元系统中的 37 美元模型进行了案例研究分析,并展示了模型、硬件和框架选择如何影响不同基准测试场景下的模型准确性和性能。我们进一步展示了 MLModelScope'
更新日期:2019-11-20
down
wechat
bug