LUXOR: An FPGA Logic Cell Architecture for Efficient Compressor Tree Implementations,arXiv - CS - Hardware Architecture

当前位置： X-MOL 学术 › arXiv.cs.AR › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

LUXOR: An FPGA Logic Cell Architecture for Efficient Compressor Tree Implementations
arXiv - CS - Hardware Architecture Pub Date : 2020-03-06 , DOI: arxiv-2003.03043
SeyedRamin Rasoulinezhad, Siddhartha, Hao Zhou, Lingli Wang, David Boland, Philip H.W. Leong

We propose two tiers of modifications to FPGA logic cell architecture to deliver a variety of performance and utilization benefits with only minor area overheads. In the irst tier, we augment existing commercial logic cell datapaths with a 6-input XOR gate in order to improve the expressiveness of each element, while maintaining backward compatibility. This new architecture is vendor-agnostic, and we refer to it as LUXOR. We also consider a secondary tier of vendor-speciic modifications to both Xilinx and Intel FPGAs, which we refer to as X-LUXOR+ and I-LUXOR+ respectively. We demonstrate that compressor tree synthesis using generalized parallel counters (GPCs) is further improved with the proposed modifications. Using both the Intel adaptive logic module and the Xilinx slice at the 65nm technology node for a comparative study, it is shown that the silicon area overhead is less than 0.5% for LUXOR and 5-6% for LUXOR+, while the delay increments are 1-6% and 3-9% respectively. We demonstrate that LUXOR can deliver an average reduction of 13-19% in logic utilization on micro-benchmarks from a variety of domains.BNN benchmarks benefit the most with an average reduction of 37-47% in logic utilization, which is due to the highly-efficient mapping of the XnorPopcount operation on our proposed LUXOR+ logic cells.

中文翻译：

LUXOR：用于高效压缩机树实现的 FPGA 逻辑单元架构

我们建议对 FPGA 逻辑单元架构进行两层修改，以提供各种性能和利用率优势，而只需很小的面积开销。在第一层，我们使用 6 输入 XOR 门扩充现有的商业逻辑单元数据路径，以提高每个元素的表现力，同时保持向后兼容性。这种新架构与供应商无关，我们将其称为 LUXOR。我们还考虑对 Xilinx 和 Intel FPGA 进行第二层供应商特定修改，我们分别将其称为 X-LUXOR+ 和 I-LUXOR+。我们证明了使用广义并行计数器 (GPC) 的压缩器树合成通过所提出的修改得到了进一步改进。使用英特尔自适应逻辑模块和赛灵思切片在 65nm 技术节点进行比较研究，结果表明，LUXOR 的硅面积开销小于 0.5%，LUXOR+ 为 5-6%，而延迟增量分别为 1-6% 和 3-9%。我们证明，LUXOR 可以在来自各种领域的微基准测试中平均降低 13-19% 的逻辑利用率。 BNN 基准测试受益最大，逻辑利用率平均降低 37-47%，这是由于XnorPopcount 操作在我们提议的 LUXOR+ 逻辑单元上的高效映射。

更新日期：2020-03-09

点击分享查看原文

点击收藏

阅读更多本刊最新论文