HPC AI500: Representative, Repeatable and Simple HPC AI Benchmarking,arXiv - CS - Performance

当前位置： X-MOL 学术 › arXiv.cs.PF › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

HPC AI500: Representative, Repeatable and Simple HPC AI Benchmarking
arXiv - CS - Performance Pub Date : 2021-02-25 , DOI: arxiv-2102.12848
Zihan Jiang, Wanling Gao, Fei Tang, Xingwang Xiong, Lei Wang, Chuanxin Lan, Chunjie Luo, Hongxiao Li, Jianfeng Zhan

Recent years witness a trend of applying large-scale distributed deep learning algorithms (HPC AI) in both business and scientific computing areas, whose goal is to speed up the training time to achieve a state-of-the-art quality. The HPC AI benchmarks accelerate the process. Unfortunately, benchmarking HPC AI systems at scale raises serious challenges. This paper presents a representative, repeatable and simple HPC AI benchmarking methodology. Among the seventeen AI workloads of AIBench Training -- by far the most comprehensive AI Training benchmarks suite -- we choose two representative and repeatable AI workloads. The selected HPC AI benchmarks include both business and scientific computing: Image Classification and Extreme Weather Analytics. To rank HPC AI systems, we present a new metric named Valid FLOPS, emphasizing both throughput performance and a target quality. The specification, source code, datasets, and HPC AI500 ranking numbers are publicly available from \url{https://www.benchcouncil.org/HPCAI500/}.

中文翻译：

HPC AI500：代表性，可重复和简单的HPC AI基准测试

近年来见证了在商业和科学计算领域中应用大规模分布式深度学习算法（HPC AI）的趋势，其目的是缩短培训时间，以达到最先进的质量。HPC AI基准测试可加速该过程。不幸的是，对大规模高性能计算AI系统进行基准测试提出了严峻的挑战。本文介绍了一种代表性，可重复且简单的HPC AI基准测试方法。在AIBench Training的17种AI工作负载中（迄今为止最全面的AI Training基准测试套件），我们选择了两个有代表性且可重复的AI工作负载。选定的HPC AI基准包括商业和科学计算：图像分类和极端天气分析。为了对HPC AI系统进行排名，我们提出了一个新的指标，称为有效FLOPS，强调吞吐性能和目标质量。规范，源代码，数据集和HPC AI500排名号可从\ url {https://www.benchcouncil.org/HPCAI500/}公开获得。

更新日期：2021-02-26

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>