Analyzing the Performance Trade-Off in Implementing User-Level Threads,IEEE Transactions on Parallel and Distributed Systems

当前位置： X-MOL 学术 › IEEE Trans. Parallel Distrib. Syst. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Analyzing the Performance Trade-Off in Implementing User-Level Threads
IEEE Transactions on Parallel and Distributed Systems ( IF 5.6 ) Pub Date : 2020-08-01 , DOI: 10.1109/tpds.2020.2976057
Shintaro Iwasaki , Abdelhalim Amer , Kenjiro Taura , Pavan Balaji

User-level threads have been widely adopted as a means of achieving lightweight concurrent execution without the costs of OS-level threads. Nevertheless, the costs of managing user-level threads represent a performance barrier that dictates how fine grained the concurrency exposed by an application can be without incurring significant overheads; this in turn may translate into insufficient parallelism to exploit highly parallel systems. This article is a deep dive into the fundamental costs in implementing user-level threads. We first identify that one of the highest sources of fork-join overheads stems from deviations, events that incur context switching during the execution of a thread and disrupt a run-to-completion execution. We then conduct an in-depth investigation of a wide spectrum of methods with respect to how they handle deviations while covering both parent- and child-first scheduling policies. Our methodology involves a comprehensive instruction- and cache-level analysis of all methods on several modern CPU architectures. The primary finding of our evaluation is that dynamic promotion methods that assume the absence of deviation and dynamically provide context-switching support offer the best trade-off between performance and capability when the likelihood of deviation is low.

中文翻译：

分析实现用户级线程的性能权衡

用户级线程已被广泛采用作为实现轻量级并发执行的一种手段，而无需操作系统级线程的成本。然而，管理用户级线程的成本代表了一个性能障碍，它决定了应用程序公开的并发性可以在不产生显着开销的情况下进行多细粒度；这反过来可能会导致并行性不足，无法利用高度并行的系统。本文深入探讨了实现用户级线程的基本成本。我们首先确定 fork-join 开销的最高来源之一源于偏差，即在线程执行期间引起上下文切换并中断运行到完成执行的事件。然后，我们对广泛的方法进行深入研究，了解它们如何处理偏差，同时涵盖父优先和子优先调度策略。我们的方法涉及对几种现代 CPU 架构上的所有方法进行全面的指令和缓存级分析。我们评估的主要发现是，假设不存在偏差并动态提供上下文切换支持的动态提升方法在偏差可能性较低时提供了性能和能力之间的最佳折衷。

更新日期：2020-08-01

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11