Approximate Cache in GPGPUs,ACM Transactions on Embedded Computing Systems

当前位置： X-MOL 学术 › ACM Trans. Embed. Comput. Syst. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Approximate Cache in GPGPUs
ACM Transactions on Embedded Computing Systems ( IF 2 ) Pub Date : 2020-09-26 , DOI: 10.1145/3407904
Ehsan Atoofian ₁

Affiliation

There is a growing number of application domains ranging from multimedia to machine learning where a certain level of inexactness can be tolerated. For these applications, approximate computing is an effective technique that trades off some loss in output data integrity for energy and/or performance gains. In this article, we present the approximate cache, which approximates similar values and saves energy in the L2 cache of general-purpose graphics processing units (GPGPUs). The L2 cache is a critical component in memory hierarchy of GPGPUs, as it accommodates data of thousands of simultaneously executing threads. Simply increasing the size of the L2 cache is not a viable solution to keep up with the growing size of data in many-core applications. This work is motivated by the observation that threads within a warp write values into memory that are arithmetically similar. We exploit this property and propose a low-cost and implementation-efficient hardware to trade off accuracy for energy. The approximate cache identifies similar values during the runtime and allows only one thread writes into the cache in the event of similarity. Since the approximate cache is able to pack more data in a smaller space, it enables downsizing of the data array with negligible impact on cache misses and lower-level memory. The approximate cache reduces both dynamic and static energy. By storing data of a thread into a cache block, each memory instruction requires accessing fewer cache cells, thus reducing dynamic energy. In addition, the approximate cache increases frequency of bank idleness. By power gating idle banks, static energy is reduced. Our evaluations reveal that the approximate cache reduces energy by 52% with minimal quality degradation while maintaining performance of a diverse set of GPGPU applications.

中文翻译：

GPGPU 中的近似缓存

从多媒体到机器学习，越来越多的应用领域可以容忍一定程度的不精确性。对于这些应用程序，近似计算是一种有效的技术，它可以在输出数据完整性方面的一些损失换取能量和/或性能增益。在本文中，我们介绍了近似缓存，它在通用图形处理单元 (GPGPU) 的 L2 缓存中近似于相似的值并节省了能源。L2 缓存是 GPGPU 内存层次结构中的一个关键组件，因为它可以容纳数千个同时执行的线程的数据。简单地增加 L2 缓存的大小并不是跟上多核应用程序中不断增长的数据大小的可行解决方案。这项工作的动机是观察到扭曲中的线程将值写入内存中算术相似。我们利用这一特性并提出了一种低成本和实施效率高的硬件来权衡能量的准确性。近似缓存在运行时识别相似的值，并且在相似的情况下只允许一个线程写入缓存。由于近似缓存能够在更小的空间中打包更多数据，因此可以缩小数据阵列的大小，而对缓存未命中和低级内存的影响可以忽略不计。近似缓存减少了动态和静态能量。通过将线程的数据存储到缓存块中，每条内存指令需要访问的缓存单元更少，从而降低了动态能量。此外，近似缓存增加了银行空闲的频率。通过对空闲组进行电源门控，减少了静态能量。我们的评估表明，近似缓存在保持各种 GPGPU 应用程序的性能的同时，以最小的质量下降降低了 52% 的能量。

更新日期：2020-09-26

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>