当前位置: X-MOL 学术ACM Trans. Reconfig. Technol. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Large-scale Cellular Automata on FPGAs
ACM Transactions on Reconfigurable Technology and Systems ( IF 2.3 ) Pub Date : 2020-12-14 , DOI: 10.1145/3423185
Nikolaos Kyparissas 1 , Apostolos Dollas 1
Affiliation  

Cellular automata (CA) are discrete mathematical models discovered in the 1940s by John von Neumann and Stanislaw Ulam and have been used extensively in many scientific disciplines ever since. The present work evolved from a Field Programmable Gate Array– (FPGA) based design to simulate urban growth into a generic architecture that is automatically generated by a framework to efficiently compute complex cellular automata with large 29 × 29 neighborhoods in Cartesian or toroidal grids, with 16 or 256 states per cell. The new architecture and the framework are presented in detail, including results in terms of modeling capabilities and performance. Large neighborhoods greatly enhance CA modeling capabilities, such as the implementation of anisotropic rules. Performance-wise, the proposed architecture runs on a medium-size FPGA up to 51 times faster vs. a CPU running highly optimized C code. Compared to GPUs the speedup is harder to quantify, because CA results have been reported on GPU implementations with neighborhoods up to 11 × 11, in which case FPGA performance is roughly on par with GPU; however, based on published GPU trends, for 29 × 29 neighborhoods the proposed architecture is expected to have better performance vs. a GPU, at one-10th the energy requirements. The architecture and sample designs are open source available under the creative commons license.

中文翻译:

FPGA 上的大规模元胞自动机

元胞自动机 (CA) 是 John von Neumann 和 Stanislaw Ulam 在 1940 年代发现的离散数学模型,从那时起就广泛用于许多科学学科。目前的工作从基于现场可编程门阵列(FPGA)的设计演变为模拟城市增长的通用架构,该架构由框架自动生成,以有效计算笛卡尔或环形网格中具有 29 × 29 大邻域的复杂元胞自动机,每个单元格 16 或 256 个状态。详细介绍了新架构和框架,包括建模能力和性能方面的结果。大型邻域极大地增强了 CA 建模能力,例如各向异性规则的实施。性能方面,与运行高度优化的 C 代码的 CPU 相比,所提议的架构在中型 FPGA 上运行速度快 51 倍。与 GPU 相比,加速比更难量化,因为在 GPU 实现上报告了 CA 结果,其邻域高达 11 × 11,在这种情况下,FPGA 性能与 GPU 大致相当;然而,基于已发布的 GPU 趋势,对于 29 × 29 的邻域,所提出的架构预计将比 GPU 具有更好的性能,其能量需求仅为其十分之一。架构和示例设计在创作共用许可下是开源的。根据已发布的 GPU 趋势,对于 29 × 29 的邻域,所提出的架构预计将比 GPU 具有更好的性能,其能量需求仅为其十分之一。架构和示例设计在创作共用许可下是开源的。根据已发布的 GPU 趋势,对于 29 × 29 的邻域,所提出的架构预计将比 GPU 具有更好的性能,其能量需求仅为其十分之一。架构和示例设计在创作共用许可下是开源的。
更新日期:2020-12-14
down
wechat
bug