当前位置: X-MOL 学术arXiv.cs.AR › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Programmable FPGA-based Memory Controller
arXiv - CS - Hardware Architecture Pub Date : 2021-08-21 , DOI: arxiv-2108.09601
Sasindu Wijeratne, Sanket Pattnaik, Zhiyu Chen, Rajgopal Kannan, Viktor Prasanna

Even with generational improvements in DRAM technology, memory access latency still remains the major bottleneck for application accelerators, primarily due to limitations in memory interface IPs which cannot fully account for variations in target applications, the algorithms used, and accelerator architectures. Since developing memory controllers for different applications is time-consuming, this paper introduces a modular and programmable memory controller that can be configured for different target applications on available hardware resources. The proposed memory controller efficiently supports cache-line accesses along with bulk memory transfers. The user can configure the controller depending on the available logic resources on the FPGA, memory access pattern, and external memory specifications. The modular design supports various memory access optimization techniques including, request scheduling, internal caching, and direct memory access. These techniques contribute to reducing the overall latency while maintaining high sustained bandwidth. We implement the system on a state-of-the-art FPGA and evaluate its performance using two widely studied domains: graph analytics and deep learning workloads. We show improved overall memory access time up to 58% on CNN and GCN workloads compared with commercial memory controller IPs.

中文翻译:

基于可编程 FPGA 的存储器控​​制器

即使 DRAM 技术有了几代改进,内存访问延迟仍然是应用程序加速器的主要瓶颈,这主要是由于内存接口 IP 的限制无法完全考虑目标应用程序、使用的算法和加速器架构的变化。由于为不同的应用程序开发内存控制器非常耗时,本文介绍了一种模块化和可编程的内存控制器,可以在可用的硬件资源上为不同的目标应用程序进行配置。建议的内存控制器有效地支持高速缓存线访问以及大容量内存传输。用户可以根据 FPGA 上的可用逻辑资源、存储器访问模式和外部存储器规范来配置控制器。模块化设计支持各种内存访问优化技术,包括请求调度、内部缓存和直接内存访问。这些技术有助于减少整体延迟,同时保持高持续带宽。我们在最先进的 FPGA 上实施该系统,并使用两个广泛研究的领域来评估其性能:图形分析和深度学习工作负载。与商用内存控制器 IP 相比,CNN 和 GCN 工作负载的整体内存访问时间提高了 58%。图分析和深度学习工作负载。与商用内存控制器 IP 相比,CNN 和 GCN 工作负载的整体内存访问时间提高了 58%。图分析和深度学习工作负载。与商用内存控制器 IP 相比,CNN 和 GCN 工作负载的整体内存访问时间提高了 58%。
更新日期:2021-08-24
down
wechat
bug