Efficient Methods for Mapping Neural Machine Translator on FPGAs,IEEE Transactions on Parallel and Distributed Systems

当前位置： X-MOL 学术 › IEEE Trans. Parallel Distrib. Syst. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Efficient Methods for Mapping Neural Machine Translator on FPGAs
IEEE Transactions on Parallel and Distributed Systems ( IF 5.6 ) Pub Date : 2020-12-25 , DOI: 10.1109/tpds.2020.3047371
Qin Li , Xiaofan Zhang , Jinjun Xiong , Wen-Mei Hwu , Deming Chen

Neural machine translation (NMT) is one of the most critical applications in natural language processing (NLP) with the main idea of converting text in one language to another using deep neural networks. In recent year, we have seen continuous development of NMT by integrating more emerging technologies, such as bidirectional gated recurrent units (GRU), attention mechanisms, and beam-search algorithms, for improved translation quality. However, with the increasing problem size, the real-life NMT models have become much more complicated and difficult to implement on hardware for acceleration opportunities. In this article, we aim to exploit the capability of FPGAs to deliver highly efficient implementations for real-life NMT applications. We map the inference of a large-scale NMT model with total computation of 172 GFLOP to a highly optimized high-level synthesis (HLS) IP and integrate the IP into Xilinx VCU118 FPGA platform. The model has widely used key features for NMTs, including the bidirectional GRU layer, attention mechanism, and beam search. We quantize the model to mixed-precision representation in which parameters and portions of calculations are in 16-bit half precision, and others remain as 32-bit floating-point. Compared to the float NMT implementation on FPGA, we achieve 13.1× speedup with an end-to-end performance of 22.0 GFLOPS without any accuracy degradation. Based on our knowledge, this is the first work that successfully implements a real-life end-to-end NMT model to an FPGA on board.

中文翻译：

在FPGA上映射神经机器转换器的有效方法

神经机器翻译（NMT）是自然语言处理（NLP）中最关键的应用之一，其主要思想是使用深度神经网络将一种语言的文本转换为另一种语言。近年来，我们通过集成更多新兴技术（例如双向门控递归单元（GRU），注意力机制和波束搜索算法）来不断改进NMT，以提高翻译质量。但是，随着问题规模的增大，现实生活中的NMT模型变得更加复杂，并且难以在硬件上实现以提供加速机会。在本文中，我们旨在利用FPGA的功能为现实的NMT应用提供高效的实现。我们将具有172 GFLOP的总计算量的大规模NMT模型的推论映射到高度优化的高级综合（HLS）IP，并将该IP集成到Xilinx VCU118 FPGA平台中。该模型具有NMT广泛使用的关键功能，包括双向GRU层，注意力机制和波束搜索。我们将模型量化为混合精度表示形式，其中参数和部分计算的精度为16位半精度，而其他参数和精度仍为32位浮点数。与FPGA上的float NMT实施相比，我们实现了13.1倍的加速，端到端性能为22.0 GFLOPS，而精度没有任何下降。根据我们的知识，这是成功将现实生活中的端到端NMT模型实现到板上FPGA的第一项工作。该模型具有NMT广泛使用的关键功能，包括双向GRU层，注意力机制和波束搜索。我们将模型量化为混合精度表示形式，其中参数和部分计算的精度为16位半精度，而其他参数和精度仍为32位浮点数。与FPGA上的float NMT实施相比，我们实现了13.1倍的加速，端到端性能为22.0 GFLOPS，而精度没有任何下降。根据我们的知识，这是成功将现实生活中的端到端NMT模型实现到板载FPGA的第一项工作。该模型具有NMT广泛使用的关键功能，包括双向GRU层，注意力机制和波束搜索。我们将模型量化为混合精度表示形式，其中参数和部分计算的精度为16位半精度，而其他参数和精度仍为32位浮点数。与FPGA上的float NMT实施相比，我们实现了13.1倍的加速，端到端性能为22.0 GFLOPS，而精度没有任何下降。根据我们的知识，这是成功将现实生活中的端到端NMT模型实现到板上FPGA的第一项工作。其他保留为32位浮点。与FPGA上的float NMT实施相比，我们实现了13.1倍的加速，端到端性能为22.0 GFLOPS，而精度没有任何下降。根据我们的知识，这是成功将现实生活中的端到端NMT模型实现到板上FPGA的第一项工作。其他保留为32位浮点。与FPGA上的float NMT实施相比，我们实现了13.1倍的加速，端到端性能为22.0 GFLOPS，而精度没有任何下降。根据我们的知识，这是成功将现实生活中的端到端NMT模型实现到板上FPGA的第一项工作。

更新日期：2021-02-19

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11