当前位置: X-MOL 学术arXiv.cs.DC › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
When parallel speedups hit the memory wall
arXiv - CS - Distributed, Parallel, and Cluster Computing Pub Date : 2019-05-03 , DOI: arxiv-1905.01234
Alex F. A. Furtunato, Kyriakos Georgiou, Kerstin Eder, Samuel Xavier-de-Souza

After Amdahl's trailblazing work, many other authors proposed analytical speedup models but none have considered the limiting effect of the memory wall. These models exploited aspects such as problem-size variation, memory size, communication overhead, and synchronization overhead, but data-access delays are assumed to be constant. Nevertheless, such delays can vary, for example, according to the number of cores used and the ratio between processor and memory frequencies. Given the large number of possible configurations of operating frequency and number of cores that current architectures can offer, suitable speedup models to describe such variations among these configurations are quite desirable for off-line or on-line scheduling decisions. This work proposes new parallel speedup models that account for variations of the average data-access delay to describe the limiting effect of the memory wall on parallel speedups. Analytical results indicate that the proposed modeling can capture the desired behavior while experimental hardware results validate the former. Additionally, we show that when accounting for parameters that reflect the intrinsic characteristics of the applications, such as degree of parallelism and susceptibility to the memory wall, our proposal has significant advantages over machine-learning-based modeling. Moreover, besides being black-box modeling, our experiments show that conventional machine-learning modeling needs about one order of magnitude more measurements to reach the same level of accuracy achieved in our modeling.

中文翻译:

当并行加速遇到内存墙时

在 Amdahl 的开创性工作之后,许多其他作者提出了分析加速模型,但没有人考虑过内存墙的限制效应。这些模型利用了问题大小变化、内存大小、通信开销和同步开销等方面,但假设数据访问延迟是恒定的。然而,这种延迟可能会有所不同,例如,根据所使用的内核数量以及处理器和内存频率之间的比率。鉴于当前架构可以提供大量可能的工作频率和内核数量配置,用于描述这些配置之间这种变化的合适加速模型对于离线或在线调度决策是非常理想的。这项工作提出了新的并行加速模型,该模型考虑了平均数据访问延迟的变化,以描述内存墙对并行加速的限制影响。分析结果表明,所提出的建模可以捕获所需的行为,而实验硬件结果验证了前者。此外,我们表明,在考虑反映应用程序内在特征的参数时,例如并行度和对内存墙的敏感性,我们的提议比基于机器学习的建模具有显着优势。此外,除了是黑盒建模之外,我们的实验表明,传统的机器学习建模需要多一个数量级的测量才能达到我们建模中达到的相同精度水平。分析结果表明,所提出的建模可以捕获所需的行为,而实验硬件结果验证了前者。此外,我们表明,在考虑反映应用程序内在特征的参数时,例如并行度和对内存墙的敏感性,我们的提议比基于机器学习的建模具有显着优势。此外,除了是黑盒建模之外,我们的实验表明,传统的机器学习建模需要多一个数量级的测量才能达到我们建模中达到的相同精度水平。分析结果表明,所提出的建模可以捕获所需的行为,而实验硬件结果验证了前者。此外,我们表明,在考虑反映应用程序内在特征的参数时,例如并行度和对内存墙的敏感性,我们的提议比基于机器学习的建模具有显着优势。此外,除了是黑盒建模之外,我们的实验表明,传统的机器学习建模需要多一个数量级的测量才能达到我们建模中达到的相同精度水平。例如并行度和对内存墙的敏感性,我们的提议比基于机器学习的建模具有显着优势。此外,除了是黑盒建模之外,我们的实验表明,传统的机器学习建模需要多一个数量级的测量才能达到我们建模中达到的相同精度水平。例如并行度和对内存墙的敏感性,我们的提议比基于机器学习的建模具有显着优势。此外,除了是黑盒建模之外,我们的实验表明,传统的机器学习建模需要多一个数量级的测量才能达到我们建模中达到的相同精度水平。
更新日期:2020-05-11
down
wechat
bug