sRSP: An efficient and scalable implementation of remote scope promotion,Concurrency and Computation: Practice and Experience

当前位置： X-MOL 学术 › Concurr. Comput. Pract. Exp. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

sRSP: An efficient and scalable implementation of remote scope promotion
Concurrency and Computation: Practice and Experience ( IF 2 ) Pub Date : 2021-07-11 , DOI: 10.1002/cpe.6483
Ayse Yilmazer‐Metin ₁

Affiliation

GPUs only support simple coherence operations. For data integrity, heavyweight synchronization operations must be explicitly used. Scoped synchronization Hower et al. (2014) and Gaster et al. (2015) has been introduced to facilitate low overhead communication, when the communication is local across a subset of threads. Scoped synchronization can be used if the sharing is static. However, it is not possible to avoid using heavyweight global-scoped synchronization when the sharing is dynamic. Asymmetric sharing is one of the dynamic sharing patterns where a shared data is frequently accessed by a single agent while being rarely accessed by remote agents. It requires using global-scoped synchronization to encompass all possible synchronization agents on a GPU device without any special support. Remote scope promotion (RSP) Orr et al. (2015) allows use of lightweight local-scoped synchronization for frequent accesses and defers the most synchronization overhead to the rare remote accesses. We propose sRSP which is an efficient and scalable implementation of RSP semantics. sRSP tracks local-scoped synchronizations on local caches. When performing remote-scoped synchronization, it applies the costly cache operations selectively. We thoroughly evaluate our work and show that it reduces remote synchronization overhead significantly and scales better. On the average, it improves the performance around 25% for a GPU device with 64 compute units.

更新日期：2021-07-11

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>