当前位置: X-MOL 学术IEEE Trans. Parallel Distrib. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A Machine-Learning-Based Framework for Productive Locality Exploitation
IEEE Transactions on Parallel and Distributed Systems ( IF 5.3 ) Pub Date : 2021-01-13 , DOI: 10.1109/tpds.2021.3051348
Engin Kayraklioglu , Erwan Favry , Tarek El-Ghazawi

Data locality is of extreme importance in programming distributed-memory architectures due to its implications on latency and energy consumption. Automated compiler and runtime system optimization studies have attempted to improve data locality exploitation without burdening the programmer. However, due to the difficulty of static code analysis, conservatism in compiler optimizations to avoid errors, and cost of dynamic analysis, the efficacy of automated optimizations is limited. Therefore, programmers need to spend significant effort in optimizing locality while creating applications for distributed memory parallel systems. We present a machine-learning based framework to automatically exploit locality in distributed memory applications. This framework takes application source whose time-critical blocks are marked by pragmas, and produces optimized source code that uses a regressor for efficient data movement. The regressor is trained with automatically-collected application profiles with very small input data sizes. We integrate our prototype in the Chapel language stack. In our experiments, we show that the Elastic Net model is the ideal regressor for our case and applications that utilize Elastic Net can perform very similarly to programmer-optimized versions. We also show that such regressors can be trained within few minutes on a cluster or within 30 minutes on a workstation, including data collection.

中文翻译:

基于机器学习的生产性地方开发框架

由于数据局部性对延迟和能耗的影响,因此数据局部性在编程分布式内存体系结构中极为重要。自动化的编译器和运行时系统优化研究已尝试在不增加程序员负担的情况下改善数据局部性。但是,由于静态代码分析的困难,避免错误的编译器优化中的保守性以及动态分析的成本,自动优化的效果受到限制。因此,程序员在创建分布式内存并行系统的应用程序时需要花费大量精力来优化位置。我们提出了一种基于机器学习的框架,可以自动利用分布式内存应用程序中的局部性。该框架采用的应用程序源中,对时间要求严格的块用实用标记表示,并生成优化的源代码,该代码使用回归器进行有效的数据移动。使用自动收集的应用程序配置文件对回归器进行训练,该配置文件具有非常小的输入数据大小。我们将原型集成到Chapel语言堆栈中。在我们的实验中,我们证明了Elastic Net模型是我们案例的理想回归者,并且利用Elastic Net的应用程序的性能可以与程序员优化的版本非常相似。我们还表明,可以在群集上的几分钟内或在工作站上的30分钟内(包括数据收集)对此类回归器进行训练。我们证明了Elastic Net模型是我们案例的理想回归者,并且利用Elastic Net的应用程序的性能可以与程序员优化的版本非常相似。我们还表明,可以在群集上的几分钟内或在工作站上的30分钟内(包括数据收集)对此类回归器进行训练。我们证明了Elastic Net模型是我们案例的理想回归器,而利用Elastic Net的应用程序的性能可以与程序员优化的版本非常相似。我们还表明,可以在群集上的几分钟内或在工作站上的30分钟内(包括数据收集)对此类回归器进行训练。
更新日期:2021-02-05
down
wechat
bug