当前位置: X-MOL 学术Int. J. Intell. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
An efficient hardware supported and parallelization architecture for intelligent systems to overcome speculative overheads
International Journal of Intelligent Systems ( IF 5.0 ) Pub Date : 2022-09-08 , DOI: 10.1002/int.23062
Sudhakar Kumar, Sunil K. Singh, Naveen Aggarwal, Brij B. Gupta, Wadee Alhalabi, Shahab S. Band

In the last few decades, technology advancements have paved the way for the creation of intelligent and autonomous systems that utilize complex calculations which are both time-consuming and central processing unit intensive. As a consequence, parallel processing systems are gaining popularity to enhance overall computer performance. Programmers should be able to efficiently utilize available hardware resources with parallelization in an ideal world. Through the automatic parallelization of sequential code, multithreading can be executed without extra supervision. However, a wide range of software dependencies prevents this from being feasible. An architectural framework for speculative parallelization along with an efficient memory analysis and computational algorithms for the code generation are proposed that can provide optimal performance. Furthermore, a suitable support of hardware design as a runtime library to the proposed architectural framework is presented which can be used to recover misspeculated results during execution to minimize speculative parallelism overhead. The implementation makes use of the Low-Level Virtual Machine compiler infrastructure and is tested on numerous benchmarks, thus making it highly scalable in terms of programming languages and architectures. According to our experimental results, there is significant potential for speedup increase. In comparison to the overall function speedup, that is, geomean speedup of 5.2× approximately when using the proposed architecture without hardware support, the proposed architectural framework and algorithm with hardware support give an average geomean speedup of 7.0× approximately on the given benchmark which is written in C/C++.

中文翻译:

一种高效的硬件支持和并行化架构,用于智能系统以克服推测性开销

在过去的几十年中,技术进步为创建智能和自主系统铺平了道路,这些系统利用复杂的计算,这些计算既耗时又需要中央处理单元。因此,并行处理系统越来越受欢迎,以提高整体计算机性能。在理想情况下,程序员应该能够通过并行化有效地利用可用的硬件资源。通过顺序代码的自动并行化,可以在没有额外监督的情况下执行多线程。然而,广泛的软件依赖性阻止了这成为可能。提出了一种用于推测性并行化的体系结构框架以及用于代码生成的有效内存分析和计算算法,可以提供最佳性能。此外,还提出了对硬件设计的适当支持,作为所提出的体系结构框架的运行时库,可用于在执行期间恢复错误推测的结果,以最大限度地减少推测并行性开销。该实现利用了低级虚拟机编译器基础设施,并在众多基准测试中进行了测试,从而使其在编程语言和体系结构方面具有高度可扩展性。根据我们的实验结果,加速比有很大的提升潜力。与整体功能加速相比,即在没有硬件支持的情况下使用所提出的架构时大约 5.2 倍的 geomean 加速比,所提出的有硬件支持的架构框架和算法给出了 7 的平均 geomean 加速。
更新日期:2022-09-08
down
wechat
bug