Fast and Portable Locking for Multicore Architectures,ACM Transactions on Computer Systems

当前位置： X-MOL 学术 › ACM Trans. Comput. Syst. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Fast and Portable Locking for Multicore Architectures
ACM Transactions on Computer Systems ( IF 1.5 ) Pub Date : 2020-04-04 , DOI: 10.1145/2845079
Jean-Pierre Lozi ₁ , Florian David ₂ , Gaël Thomas ₃ , Julia Lawall ₂ , Gilles Muller ₂

Affiliation

The scalability of multithreaded applications on current multicore systems is hampered by the performance of lock algorithms, due to the costs of access contention and cache misses. The main contribution presented in this article is a new locking technique, Remote Core Locking (RCL), that aims to accelerate the execution of critical sections in legacy applications on multicore architectures. The idea of RCL is to replace lock acquisitions by optimized remote procedure calls to a dedicated server hardware thread. RCL limits the performance collapse observed with other lock algorithms when many threads try to acquire a lock concurrently and removes the need to transfer lock-protected shared data to the hardware thread acquiring the lock, because such data can typically remain in the server’s cache. Other contributions presented in this article include a profiler that identifies the locks that are the bottlenecks in multithreaded applications and that can thus benefit from RCL, and a reengineering tool that transforms POSIX lock acquisitions into RCL locks. Eighteen applications were used to evaluate RCL: the nine applications of the SPLASH-2 benchmark suite, the seven applications of the Phoenix 2 benchmark suite, Memcached, and Berkeley DB with a TPC-C client. Eight of these applications are unable to scale because of locks and benefit from RCL on an ×86 machine with four AMD Opteron processors and 48 hardware threads. By using RCL instead of Linux POSIX locks, performance is improved by up to 2.5 times on Memcached, and up to 11.6 times on Berkeley DB with the TPC-C client. On a SPARC machine with two Sun Ultrasparc T2+ processors and 128 hardware threads, three applications benefit from RCL. In particular, performance is improved by up to 1.3 times with respect to Solaris POSIX locks on Memcached, and up to 7.9 times on Berkeley DB with the TPC-C client.

中文翻译：

适用于多核架构的快速便携锁定

由于访问争用和缓存未命中的成本，锁定算法的性能阻碍了当前多核系统上多线程应用程序的可扩展性。本文介绍的主要贡献是一种新的锁定技术，远程核心锁定 (RCL)，旨在加速多核架构上遗留应用程序中关键部分的执行。RCL 的想法是通过优化远程过程调用来替换锁定获取服务器硬件线程。当许多线程尝试同时获取锁时，RCL 限制了使用其他锁算法观察到的性能崩溃，并且无需将受锁保护的共享数据传输到获取锁的硬件线程，因为这些数据通常可以保留在服务器的缓存中。本文中介绍的其他贡献包括一个分析器，该分析器识别多线程应用程序中的瓶颈锁，从而可以从 RCL 中受益，以及一个将 POSIX 锁获取转换为 RCL 锁的重新设计工具。使用了 18 个应用程序来评估 RCL：SPLASH-2 基准套件的 9 个应用程序、Phoenix 2 基准测试套件的 7 个应用程序、Memcached 和带有 TPC-C 客户端的 Berkeley DB。其中 8 个应用程序因锁而无法扩展，并且在具有 4 个 AMD Opteron 处理器和 48 个硬件线程的 ×86 机器上受益于 RCL。通过使用 RCL 代替 Linux POSIX 锁，在 Memcached 上的性能提高了 2.5 倍，在带有 TPC-C 客户端的 Berkeley DB 上提高了 11.6 倍。在具有两个 Sun Ultrasparc T2+ 处理器和 128 个硬件线程的 SPARC 机器上，三个应用程序受益于 RCL。特别是，Memcached 上的 Solaris POSIX 锁性能提高了 1.3 倍，使用 TPC-C 客户端的 Berkeley DB 性能提高了 7.9 倍。在具有两个 Sun Ultrasparc T2+ 处理器和 128 个硬件线程的 SPARC 机器上，三个应用程序受益于 RCL。特别是，Memcached 上的 Solaris POSIX 锁性能提高了 1.3 倍，使用 TPC-C 客户端的 Berkeley DB 性能提高了 7.9 倍。在具有两个 Sun Ultrasparc T2+ 处理器和 128 个硬件线程的 SPARC 机器上，三个应用程序受益于 RCL。特别是，Memcached 上的 Solaris POSIX 锁性能提高了 1.3 倍，使用 TPC-C 客户端的 Berkeley DB 性能提高了 7.9 倍。

更新日期：2020-04-04

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>