当前位置: X-MOL 学术arXiv.cs.NE › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
AutoML for Multilayer Perceptron and FPGA Co-design
arXiv - CS - Neural and Evolutionary Computing Pub Date : 2020-09-14 , DOI: arxiv-2009.06156
Philip Colangelo, Oren Segal, Alex Speicher, Martin Margala

State-of-the-art Neural Network Architectures (NNAs) are challenging to design and implement efficiently in hardware. In the past couple of years, this has led to an explosion in research and development of automatic Neural Architecture Search (NAS) tools. AutomML tools are now used to achieve state of the art NNA designs and attempt to optimize for hardware usage and design. Much of the recent research in the auto-design of NNAs has focused on convolution networks and image recognition, ignoring the fact that a significant part of the workload in data centers is general-purpose deep neural networks. In this work, we develop and test a general multilayer perceptron (MLP) flow that can take arbitrary datasets as input and automatically produce optimized NNAs and hardware designs. We test the flow on six benchmarks. Our results show we exceed the performance of currently published MLP accuracy results and are competitive with non-MLP based results. We compare general and common GPU architectures with our scalable FPGA design and show we can achieve higher efficiency and higher throughput (outputs per second) for the majority of datasets. Further insights into the design space for both accurate networks and high performing hardware shows the power of co-design by correlating accuracy versus throughput, network size versus accuracy, and scaling to high-performance devices.

中文翻译:

用于多层感知器和 FPGA 协同设计的 AutoML

最先进的神经网络架构 (NNA) 很难在硬件中有效地设计和实现。在过去的几年里,这导致了自动神经架构搜索 (NAS) 工具的研究和开发的爆炸式增长。AutomML 工具现在用于实现最先进的 NNA 设计,并尝试优化硬件使用和设计。最近 NNA 自动设计的大部分研究都集中在卷积网络和图像识别上,忽略了数据中心工作负载的很大一部分是通用深度神经网络的事实。在这项工作中,我们开发并测试了一个通用的多层感知器 (MLP) 流程,该流程可以将任意数据集作为输入并自动生成优化的 NNA 和硬件设计。我们在六个基准测试中测试流程。我们的结果表明,我们超越了当前发布的 MLP 准确性结果的性能,并且与非基于 MLP 的结果相比具有竞争力。我们将通用和常见的 GPU 架构与我们的可扩展 FPGA 设计进行比较,并表明我们可以为大多数数据集实现更高的效率和更高的吞吐量(每秒输出)。通过将准确度与吞吐量、网络大小与准确度以及扩展到高性能设备相关联,对精确网络和高性能硬件的设计空间的进一步洞察显示了协同设计的力量。我们将通用和常见的 GPU 架构与我们的可扩展 FPGA 设计进行比较,并表明我们可以为大多数数据集实现更高的效率和更高的吞吐量(每秒输出)。通过将准确度与吞吐量、网络大小与准确度以及扩展到高性能设备相关联,对精确网络和高性能硬件的设计空间的进一步洞察显示了协同设计的力量。我们将通用和常见 GPU 架构与我们的可扩展 FPGA 设计进行比较,并表明我们可以为大多数数据集实现更高的效率和更高的吞吐量(每秒输出)。通过将准确度与吞吐量、网络大小与准确度以及扩展到高性能设备相关联,对精确网络和高性能硬件的设计空间的进一步洞察显示了协同设计的力量。
更新日期:2020-09-15
down
wechat
bug