当前位置: X-MOL 学术IEEE Trans. Comput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Efficiently Solving Partial Differential Equations in a Partially Reconfigurable Specialized Hardware
IEEE Transactions on Computers ( IF 3.7 ) Pub Date : 2021-02-19 , DOI: 10.1109/tc.2021.3060700
Bahar Asgari 1 , Ramyad Hadidi 2 , Tushar Krishna 1 , Hyesoon Kim 2 , Sudhakar Yalamanchili 1
Affiliation  

Scientific computations with a wide range of applications in domains such as developing vaccines, forecasting the weather, predicting natural disasters, simulating aerodynamics of spacecraft, and exploring oil resources, create the main workloads of supercomputers. The key integration of such scientific computations is modeling physical phenomena that are done with the aid of partial differential equations (PDEs). Solving PDEs on supercomputers, even with those equipped with GPUs, consumes a large amount of power and yet is not as fast as desired. The main reason behind such slow processing is data dependency. The key challenge is that software techniques cannot resolve these dependencies, therefore, such applications cannot benefit from the parallelism provided by processors such as GPUs. Our key insight to address this challenge is that although we cannot resolve the dependencies, we can reduce their negative impacts by using hardware/software co-optimization. To this end, we propose breaking down the data-dependent operations into two groups of operations: a majority of parallelizable and the minority of data-dependent operations. We execute these two groups in the desired order: first, we put together all parallelizable operations and execute them all, subsequently; then, we switch to execute the small data-dependent part. As long as the data-dependent part is small, we can accelerate them by using fast hardware mechanisms. Besides, our proposed hardware mechanisms guarantee quickly switching between the two groups of operations. To follow the same order of execution, dictated by our software mechanism, and implemented in hardware, we also propose a new low-overhead compression format – sparsity is another attribute of PDEs that require compression. Furthermore, the core generic architecture of our proposed hardware allows the execution of other applications including sparse matrix-vector multiplication (SpMV) and graph algorithms. The key feature of the proposed hardware is partial reconfigurability, which on one hand, facilitates the execution of data-dependent computations, and on the other hand, allows executing broad application without changing the entire configuration. Our evaluations show that compared to GPUs, we achieve an average speedup of 15.6× for scientific computations while consuming 14× less energy.

中文翻译:

在部分可重构的专用硬件中有效求解偏微分方程

科学计算在诸如开发疫苗,预测天气,预测自然灾害,模拟航天器的空气动力学以及探索石油资源等领域中具有广泛的应用,这构成了超级计算机的主要工作量。此类科学计算的关键集成在于对物理现象进行建模,这些物理现象是借助偏微分方程(PDE)完成的。即使在配备了GPU的超级计算机上解决PDE,也要消耗大量功率,但并没有达到所需的速度。如此缓慢的处理背后的主要原因是数据依赖性。关键的挑战是软件技术无法解决这些依赖性,因此,此类应用程序无法受益于诸如GPU之类的处理器所提供的并行性。解决这一挑战的关键见解是,尽管我们无法解决依赖关系,但可以通过使用硬件/软件共同优化来减少其负面影响。为此,我们建议将与数据相关的操作分为两组操作:大多数可并行化操作和少数与数据相关的操作。我们以所需的顺序执行这两组操作:首先,将所有可并行化的操作放在一起,然后再执行它们。然后,我们切换到执行依赖数据的小部分。只要数据相关的部分很小,我们就可以使用快速的硬件机制来加速它们。此外,我们提出的硬件机制可确保在两组操作之间快速切换。为了遵循我们的软件机制所规定的相同执行顺序,并且以硬件实现,我们还提出了一种新的低开销压缩格式–稀疏性是需要压缩的PDE的另一个属性。此外,我们提出的硬件的核心通用体系结构允许执行其他应用程序,包括稀疏矩阵矢量乘法(SpMV)和图形算法。所提出的硬件的关键特征是部分可重配置性,它一方面有助于执行依赖于数据的计算,另一方面可以在不更改整个配置的情况下执行广泛的应用程序。我们的评估表明,与GPU相比,科学计算的平均速度提高了15.6倍,而能耗却减少了14倍。此外,我们提出的硬件的核心通用体系结构允许执行其他应用程序,包括稀疏矩阵矢量乘法(SpMV)和图形算法。所提出的硬件的关键特征是部分可重配置性,这一方面有利于执行依赖于数据的计算,另一方面,允许在不更改整个配置的情况下执行广泛的应用程序。我们的评估表明,与GPU相比,科学计算的平均速度提高了15.6倍,而能耗却减少了14倍。此外,我们提出的硬件的核心通用体系结构允许执行其他应用程序,包括稀疏矩阵矢量乘法(SpMV)和图形算法。所提出的硬件的关键特征是部分可重配置性,它一方面有助于执行依赖于数据的计算,另一方面可以在不更改整个配置的情况下执行广泛的应用程序。我们的评估表明,与GPU相比,科学计算的平均速度提高了15.6倍,而能耗却减少了14倍。有助于执行与数据相关的计算,另一方面,允许在不更改整个配置的情况下执行广泛的应用程序。我们的评估表明,与GPU相比,科学计算的平均速度提高了15.6倍,而能耗却减少了14倍。有助于执行与数据相关的计算,另一方面,允许在不更改整个配置的情况下执行广泛的应用程序。我们的评估表明,与GPU相比,科学计算的平均速度提高了15.6倍,而能耗却减少了14倍。
更新日期:2021-03-16
down
wechat
bug