当前位置:
X-MOL 学术
›
arXiv.cs.PF
›
论文详情
Our official English website, www.x-mol.net, welcomes your
feedback! (Note: you will need to create a separate account there.)
HPC AI500: The Methodology, Tools, Roofline Performance Models, and Metrics for Benchmarking HPC AI Systems
arXiv - CS - Performance Pub Date : 2020-07-01 , DOI: arxiv-2007.00279 Zihan Jiang, Lei Wang, Xingwang Xiong, Wanling Gao, Chunjie Luo, Fei Tang, Chuanxin Lan, Hongxiao Li, and Jianfeng Zhan
arXiv - CS - Performance Pub Date : 2020-07-01 , DOI: arxiv-2007.00279 Zihan Jiang, Lei Wang, Xingwang Xiong, Wanling Gao, Chunjie Luo, Fei Tang, Chuanxin Lan, Hongxiao Li, and Jianfeng Zhan
The recent years witness a trend of applying large-scale distributed deep
learning in both business and scientific computing areas, whose goal is to
speed up the training time to achieve a state-of-the-art quality. The HPC
community feels a great interest in building the HPC AI systems that are
dedicated to running those workloads. The HPC AI benchmarks accelerate the
process. Unfortunately, benchmarking HPC AI systems at scale raises serious
challenges. None of previous HPC AI benchmarks achieve the goal of being
equivalent, relevant, representative, affordable, and repeatable. This paper
presents a comprehensive methodology, tools, Roofline performance models, and
innovative metrics for benchmarking, optimizing, and ranking HPC AI systems,
which we call HPC AI500 V2.0. We abstract the HPC AI system into nine
independent layers, and present explicit benchmarking rules and procedures to
assure equivalence of each layer, repeatability, and replicability. On the
basis of AIBench -- by far the most comprehensive AI benchmarks suite, we
present and build two HPC AI benchmarks from both business and scientific
computing: Image Classification, and Extreme Weather Analytics, achieving both
representativeness and affordability. To rank the performance and
energy-efficiency of HPC AI systems, we propose Valid FLOPS, and Valid FLOPS
per watt, which impose a penalty on failing to achieve the target quality. We
propose using convolution and GEMM -- the two most intensively-used kernel
functions to measure the upper bound performance of the HPC AI systems, and
present HPC AI roofline models for guiding performance optimizations. The
evaluations show our methodology, benchmarks, performance models, and metrics
can measure, optimize, and rank the HPC AI systems in a scalable, simple, and
affordable way. HPC AI500 V2.0 are publicly available from
http://www.benchcouncil.org/benchhub/hpc-ai500-benchmark.
中文翻译:
HPC AI500:用于对 HPC AI 系统进行基准测试的方法论、工具、Roofline 性能模型和指标
近年来见证了在商业和科学计算领域应用大规模分布式深度学习的趋势,其目标是加快训练时间以达到最先进的质量。HPC 社区对构建专用于运行这些工作负载的 HPC AI 系统非常感兴趣。HPC AI 基准测试加速了这一过程。不幸的是,大规模对 HPC AI 系统进行基准测试提出了严峻的挑战。以前的 HPC AI 基准测试都没有达到等效、相关、具有代表性、价格合理和可重复的目标。本文介绍了用于对 HPC AI 系统(我们称为 HPC AI500 V2.0)进行基准测试、优化和排名的综合方法、工具、Roofline 性能模型和创新指标。我们将 HPC AI 系统抽象为九个独立的层,并提出明确的基准测试规则和程序,以确保每一层的等效性、可重复性和可复制性。在AIBench——迄今为止最全面的AI基准测试套件的基础上,我们展示并构建了两个来自商业和科学计算的HPC AI基准:图像分类和极端天气分析,实现了代表性和可负担性。为了对 HPC AI 系统的性能和能效进行排名,我们提出了 Valid FLOPS 和 Valid FLOPS 每瓦特,这对未能达到目标质量进行了惩罚。我们建议使用卷积和 GEMM——这两个最常用的内核函数来衡量 HPC AI 系统的上限性能,并提出 HPC AI 屋顶线模型以指导性能优化。评估显示了我们的方法、基准、性能模型和指标可以以可扩展、简单且经济实惠的方式衡量、优化和排名 HPC AI 系统。HPC AI500 V2.0 可从 http://www.benchcouncil.org/benchhub/hpc-ai500-benchmark 公开获得。
更新日期:2020-07-02
中文翻译:
HPC AI500:用于对 HPC AI 系统进行基准测试的方法论、工具、Roofline 性能模型和指标
近年来见证了在商业和科学计算领域应用大规模分布式深度学习的趋势,其目标是加快训练时间以达到最先进的质量。HPC 社区对构建专用于运行这些工作负载的 HPC AI 系统非常感兴趣。HPC AI 基准测试加速了这一过程。不幸的是,大规模对 HPC AI 系统进行基准测试提出了严峻的挑战。以前的 HPC AI 基准测试都没有达到等效、相关、具有代表性、价格合理和可重复的目标。本文介绍了用于对 HPC AI 系统(我们称为 HPC AI500 V2.0)进行基准测试、优化和排名的综合方法、工具、Roofline 性能模型和创新指标。我们将 HPC AI 系统抽象为九个独立的层,并提出明确的基准测试规则和程序,以确保每一层的等效性、可重复性和可复制性。在AIBench——迄今为止最全面的AI基准测试套件的基础上,我们展示并构建了两个来自商业和科学计算的HPC AI基准:图像分类和极端天气分析,实现了代表性和可负担性。为了对 HPC AI 系统的性能和能效进行排名,我们提出了 Valid FLOPS 和 Valid FLOPS 每瓦特,这对未能达到目标质量进行了惩罚。我们建议使用卷积和 GEMM——这两个最常用的内核函数来衡量 HPC AI 系统的上限性能,并提出 HPC AI 屋顶线模型以指导性能优化。评估显示了我们的方法、基准、性能模型和指标可以以可扩展、简单且经济实惠的方式衡量、优化和排名 HPC AI 系统。HPC AI500 V2.0 可从 http://www.benchcouncil.org/benchhub/hpc-ai500-benchmark 公开获得。