当前位置: X-MOL 学术ACM Trans. Comput. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Using Multicore Reuse Distance to Study Coherence Directories
ACM Transactions on Computer Systems ( IF 2.0 ) Pub Date : 2017-07-31 , DOI: 10.1145/3092702
Minshu Zhao 1 , Donald Yeung 2
Affiliation  

Researchers have proposed numerous techniques to improve the scalability of coherence directories. The effectiveness of these techniques not only depends on application behavior, but also on the CPU's configuration, for example, its core count and cache size. As CPUs continue to scale, it is essential to explore the directory's application and architecture dependencies. However, this is challenging given the slow speed of simulators. While it is common practice to simulate different applications, previous research on directory designs have explored only a few—and in most cases, only one—CPU configuration, which can lead to an incomplete and inaccurate view of the directory's behavior. This article proposes to use multicore reuse distance analysis to study coherence directories. We develop a framework to extract the directory access stream from parallel least recently used (LRU) stacks, enabling rapid analysis of the directory's accesses and contents across both core count and cache size scaling. A key part of our framework is the notion of relative reuse distance between sharers , which defines sharing in a capacity-dependent fashion and facilitates our analyses along the data cache size dimension. We implement our framework in a profiler and then apply it to gain insights into the impact of multicore CPU scaling on directory behavior. Our profiling results show that directory accesses reduce by 3.3× when scaling the data cache size from 16KB to 1MB, despite an increase in sharing-based directory accesses. We also show that increased sharing caused by data cache scaling allows the portion of on-chip memory occupied by the directory to be reduced by 43.3%, compared to a reduction of only 2.6% when scaling the number of cores. And, we show certain directory entries exhibit high temporal reuse. In addition to gaining insights, we also validate our profile-based results, and find they are within 2--10% of cache simulations on average, across different validation experiments. Finally, we conduct four case studies that illustrate our insights on existing directory techniques. In particular, we demonstrate our directory occupancy insights on a Cuckoo directory; we apply our sharing insights to provide bounds on the size of Scalable Coherence Directories (SCD) and Dual-Grain Directories (DGD); and, we demonstrate our directory entry reuse insights on a multilevel directory design.

中文翻译:

使用多核重用距离研究一致性目录

研究人员提出了许多技术来提高一致性目录的可扩展性。这些技术的有效性不仅取决于应用程序的行为,还取决于 CPU 的配置,例如其核心数和缓存大小。随着 CPU 的不断扩展,探索目录的应用程序至关重要架构依赖。然而,鉴于模拟器的速度较慢,这具有挑战性。虽然模拟不同的应用程序是常见的做法,但以前对目录设计的研究只探索了少数——在大多数情况下,只有一个——CPU 配置,这可能导致对目录行为的不完整和不准确的看法。本文建议使用多核复用距离分析研究连贯性目录。我们开发了一个框架来从并行最近最少使用 (LRU) 堆栈中提取目录访问流,从而能够跨核心计数和缓存大小缩放快速分析目录的访问和内容。我们框架的一个关键部分是共享者之间的相对重用距离,它以容量相关的方式定义共享,并有助于我们沿着数据缓存大小维度进行分析。我们在分析器中实现我们的框架,然后应用它来深入了解多核 CPU 扩展对目录行为的影响。我们的分析结果表明,尽管基于共享的目录访问有所增加,但当将数据缓存大小从 16KB 扩展到 1MB 时,目录访问减少了 3.3 倍。我们还表明,由数据缓存扩展引起的共享增加使目录占用的片上内存部分减少了 43.3%,而在扩展内核数量时仅减少了 2.6%。而且,我们显示某些目录条目表现出高时间重用。除了获得洞察力外,我们还验证了基于个人资料的结果,并发现它们在不同的验证实验中平均在缓存模拟的 2--10% 范围内。最后,我们进行了四个案例研究,说明了我们对现有目录技术的见解。特别是,我们展示了我们对 Cuckoo 目录的目录占用率洞察;我们应用我们的共享见解来提供可扩展一致性目录 (SCD) 和双粒度目录 (DGD) 的大小界限;并且,我们展示了我们对多级目录设计的目录条目重用见解。我们应用我们的共享见解来提供可扩展一致性目录 (SCD) 和双粒度目录 (DGD) 的大小界限;并且,我们展示了我们对多级目录设计的目录条目重用见解。我们应用我们的共享见解来提供可扩展一致性目录 (SCD) 和双粒度目录 (DGD) 的大小界限;并且,我们展示了我们对多级目录设计的目录条目重用见解。
更新日期:2017-07-31
down
wechat
bug