HPC AI500: The Methodology, Tools, Roofline Performance Models, and Metrics for Benchmarking HPC AI Systems,arXiv - CS - Performance

当前位置： X-MOL 学术 › arXiv.cs.PF › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

HPC AI500: The Methodology, Tools, Roofline Performance Models, and Metrics for Benchmarking HPC AI Systems
arXiv - CS - Performance Pub Date : 2020-07-01 , DOI: arxiv-2007.00279
Zihan Jiang, Lei Wang, Xingwang Xiong, Wanling Gao, Chunjie Luo, Fei Tang, Chuanxin Lan, Hongxiao Li, and Jianfeng Zhan

The recent years witness a trend of applying large-scale distributed deep learning in both business and scientific computing areas, whose goal is to speed up the training time to achieve a state-of-the-art quality. The HPC community feels a great interest in building the HPC AI systems that are dedicated to running those workloads. The HPC AI benchmarks accelerate the process. Unfortunately, benchmarking HPC AI systems at scale raises serious challenges. None of previous HPC AI benchmarks achieve the goal of being equivalent, relevant, representative, affordable, and repeatable. This paper presents a comprehensive methodology, tools, Roofline performance models, and innovative metrics for benchmarking, optimizing, and ranking HPC AI systems, which we call HPC AI500 V2.0. We abstract the HPC AI system into nine independent layers, and present explicit benchmarking rules and procedures to assure equivalence of each layer, repeatability, and replicability. On the basis of AIBench -- by far the most comprehensive AI benchmarks suite, we present and build two HPC AI benchmarks from both business and scientific computing: Image Classification, and Extreme Weather Analytics, achieving both representativeness and affordability. To rank the performance and energy-efficiency of HPC AI systems, we propose Valid FLOPS, and Valid FLOPS per watt, which impose a penalty on failing to achieve the target quality. We propose using convolution and GEMM -- the two most intensively-used kernel functions to measure the upper bound performance of the HPC AI systems, and present HPC AI roofline models for guiding performance optimizations. The evaluations show our methodology, benchmarks, performance models, and metrics can measure, optimize, and rank the HPC AI systems in a scalable, simple, and affordable way. HPC AI500 V2.0 are publicly available from http://www.benchcouncil.org/benchhub/hpc-ai500-benchmark.

中文翻译：

HPC AI500：用于对 HPC AI 系统进行基准测试的方法论、工具、Roofline 性能模型和指标

近年来见证了在商业和科学计算领域应用大规模分布式深度学习的趋势，其目标是加快训练时间以达到最先进的质量。HPC 社区对构建专用于运行这些工作负载的 HPC AI 系统非常感兴趣。HPC AI 基准测试加速了这一过程。不幸的是，大规模对 HPC AI 系统进行基准测试提出了严峻的挑战。以前的 HPC AI 基准测试都没有达到等效、相关、具有代表性、价格合理和可重复的目标。本文介绍了用于对 HPC AI 系统（我们称为 HPC AI500 V2.0）进行基准测试、优化和排名的综合方法、工具、Roofline 性能模型和创新指标。我们将 HPC AI 系统抽象为九个独立的层，并提出明确的基准测试规则和程序，以确保每一层的等效性、可重复性和可复制性。在AIBench——迄今为止最全面的AI基准测试套件的基础上，我们展示并构建了两个来自商业和科学计算的HPC AI基准：图像分类和极端天气分析，实现了代表性和可负担性。为了对 HPC AI 系统的性能和能效进行排名，我们提出了 Valid FLOPS 和 Valid FLOPS 每瓦特，这对未能达到目标质量进行了惩罚。我们建议使用卷积和 GEMM——这两个最常用的内核函数来衡量 HPC AI 系统的上限性能，并提出 HPC AI 屋顶线模型以指导性能优化。评估显示了我们的方法、基准、性能模型和指标可以以可扩展、简单且经济实惠的方式衡量、优化和排名 HPC AI 系统。HPC AI500 V2.0 可从 http://www.benchcouncil.org/benchhub/hpc-ai500-benchmark 公开获得。

更新日期：2020-07-02

点击分享查看原文

点击收藏

阅读更多本刊最新论文