当前位置: X-MOL 学术Des. Autom. Embed. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Dynamic concurrency throttling on NUMA systems and data migration impacts
Design Automation for Embedded Systems ( IF 0.9 ) Pub Date : 2020-11-04 , DOI: 10.1007/s10617-020-09243-5
Janaina Schwarzrock , Michael Guilherme Jordan , Guilherme Korol , Charles C. de Oliveira , Arthur F. Lorenzon , Mateus Beck Rutzig , Antonio Carlos S. Beck

Many parallel applications do not scale as the number of threads increases, which means that using the maximum number of threads will not always deliver the best outcome in performance or energy consumption. Therefore, many works have already proposed strategies for tuning the number of threads to optimize for performance or energy. Since parallel applications may have more than one parallel region, these tuning strategies can determine a specific number of threads for each application’s parallel region, or determine a fixed number of threads for the whole application execution. In the former case, strategies apply Dynamic Concurrency Throttling (DCT), which enables adapting the number of threads at runtime. However, the use of DCT implies on overheads, such as creating/destroying threads and cache warm-up. DCT’s overhead can be further aggravated in Non-uniform Memory Access systems, where changing the number of threads may incur in remote memory accesses or, more importantly, data migration between nodes. In this way, tuning strategies should not only determine the best number of threads locally, for each parallel region, but also be aware of the impacts when applying DCT. This work investigates how parallel regions may influence each other during DCT employment, showing that data migration may represent a considerable overhead. Effectively, those overheads affect the strategy’s solution, impacting the overall application performance and energy consumption. We demonstrate why many approaches will very likely fail when applied to simulated environments or will hardly reach a near-optimum solution when executed in real hardware.



中文翻译:

NUMA 系统上的动态并发限制和数据迁移影响

许多并行应用程序不会随着线程数量的增加而扩展,这意味着使用最大数量的线程并不总是能在性能或能耗方面提供最佳结果。因此,许多工作已经提出了调整线程数量以优化性能或能源的策略。由于并行应用程序可能具有多个并行区域,因此这些调优策略可以确定每个应用程序并行区域的具体线程数,或者确定整个应用程序执行的固定线程数。在前一种情况下,策略应用动态并发限制 (DCT),它可以在运行时调整线程数量。然而,使用 DCT 意味着开销,例如创建/销毁线程和缓存预热。在非均匀内存访问系统中,DCT 的开销可能会进一步加剧,其中更改线程数量可能会导致远程内存访问,或更重要的是,节点之间的数据迁移。这样,调优策略不仅应该确定每个并行区域的本地最佳线程数,而且还应该意识到应用 DCT 时的影响。这项工作研究了并行区域在 DCT 使用期间如何相互影响,表明数据迁移可能会带来相当大的开销。实际上,这些开销会影响策略的解决方案,从而影响整体应用程序性能和能耗。我们演示了为什么许多方法在应用于模拟环境时很可能会失败,或者在真实硬件中执行时很难达到接近最佳的解决方案。

更新日期:2020-11-04
down
wechat
bug