当前位置: X-MOL 学术ACM Trans. Archit. Code Optim. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Gem5-X
ACM Transactions on Architecture and Code Optimization ( IF 1.6 ) Pub Date : 2021-07-17 , DOI: 10.1145/3461662
Yasir Mahmood Qureshi 1 , William Andrew Simon 1 , Marina Zapater 1 , Katzalin Olcoz 2 , David Atienza 1
Affiliation  

The increasing adoption of smart systems in our daily life has led to the development of new applications with varying performance and energy constraints, and suitable computing architectures need to be developed for these new applications. In this article, we present gem5-X, a system-level simulation framework, based on gem-5, for architectural exploration of heterogeneous many-core systems. To demonstrate the capabilities of gem5-X, real-time video analytics is used as a case-study. It is composed of two kernels, namely, video encoding and image classification using convolutional neural networks (CNNs). First, we explore through gem5-X the benefits of latest 3D high bandwidth memory (HBM2) in different architectural configurations. Then, using a two-step exploration methodology, we develop a new optimized clustered-heterogeneous architecture with HBM2 in gem5-X for video analytics application. In this proposed clustered-heterogeneous architecture, ARMv8 in-order cluster with in-cache computing engine executes the video encoding kernel, giving 20% performance and 54% energy benefits compared to baseline ARM in-order and Out-of-Order systems, respectively. Furthermore, thanks to gem5-X, we conclude that ARM Out-of-Order clusters with HBM2 are the best choice to run visual recognition using CNNs, as they outperform DDR4-based system by up to 30% both in terms of performance and energy savings.

中文翻译:

Gem5-X

在我们的日常生活中越来越多地采用智能系统导致了具有不同性能和能源限制的新应用程序的开发,并且需要为这些新应用程序开发合适的计算架构。在本文中,我们介绍了基于 gem-5 的系统级仿真框架 gem5-X,用于异构多​​核系统的架构探索。为了演示 gem5-X 的功能,我们使用实时视频分析作为案例研究。它由两个内核组成,即使用卷积神经网络 (CNN) 进行视频编码和图像分类。首先,我们通过 gem5-X 探索不同架构配置中最新 3D 高带宽内存 (HBM2) 的优势。然后,使用两步探索方法,我们使用 gem5-X 中的 HBM2 开发了一种新的优化集群异构架构,用于视频分析应用程序。在这个提议的集群异构架构中,具有缓存计算引擎的 ARMv8 有序集群执行视频编码内核,与基线 ARM 有序和无序系统相比,分别提供 20% 的性能和 54% 的能源优势. 此外,多亏了 gem5-X,我们得出结论,带有 HBM2 的 ARM 乱序集群是使用 CNN 运行视觉识别的最佳选择,因为它们在性能和能量方面都比基于 DDR4 的系统高出 30%储蓄。与基线 ARM 有序和无序系统相比,分别提供 20% 的性能和 54% 的能源效益。此外,借助 gem5-X,我们得出结论,带有 HBM2 的 ARM 无序集群是使用 CNN 运行视觉识别的最佳选择,因为它们在性能和能量方面都比基于 DDR4 的系统高出 30%储蓄。与基线 ARM 有序和无序系统相比,分别提供 20% 的性能和 54% 的能源效益。此外,借助 gem5-X,我们得出结论,带有 HBM2 的 ARM 无序集群是使用 CNN 运行视觉识别的最佳选择,因为它们在性能和能量方面都比基于 DDR4 的系统高出 30%储蓄。
更新日期:2021-07-17
down
wechat
bug