当前位置: X-MOL 学术IEEE Trans. Comput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
WooKong: A Ubiquitous Accelerator for Recommendation Algorithms with Custom Instruction Sets on FPGA
IEEE Transactions on Computers ( IF 3.7 ) Pub Date : 2020-01-01 , DOI: 10.1109/tc.2020.2988209
Chao Wang , Lei Gong , Xiang Ma , Xi Li , Xuehai Zhou

Recommendation algorithms, such as Neighborhood-based Collaborative- Filtering (CF), have been widely applied in various emerging machine learning applications. However, under the circumstance of the explosive big data, it poses significant challenges to CF recommendation algorithms as it is becoming quite time and energy-consuming. It has to be optimized and accelerated by powerful engines to process on large data scale. To solve these problems, in this article, we propose WooKong, a ubiquitous accelerator architecture for the collaborative-filtering recommendation on FPGA. It is able to accommodate three types of CF recommendation algorithms, including User-based CF, Item-based CF, and SlopeOne recommendations algorithms, with five different similarity analysis metrics including Jaccard, Cosine, CosineIR, euclidean, and Pearson. To maintain flexibility for these different CF algorithms and metrics, we adopt custom instruction sets to manipulate the learning and prediction accelerators. We implement a hardware prototype on a real Xilinx Zynq FPGA development board. Experimental results show that the proposed learning and prediction accelerators can achieve 8.0X speedup and 1.7X speedup compared with an Intel i7 processor respectively. The accelerator has the energy benefits of up to 137.4X compared with an NVIDIA Tesla K40C GPU, with the affordable hardware cost.

中文翻译:

WooKong:在 FPGA 上使用自定义指令集的无处不在的推荐算法加速器

推荐算法,例如基于邻域的协同过滤(CF),已广泛应用于各种新兴的机器学习应用中。然而,在爆炸性大数据的情况下,CF推荐算法变得相当耗时和耗能,这对CF推荐算法提出了重大挑战。它必须通过强大的引擎进行优化和加速才能处理大规模数据。为了解决这些问题,在本文中,我们提出了 WooKong,这是一种无处不在的加速器架构,用于 FPGA 上的协同过滤推荐。它能够容纳三种类型的CF推荐算法,包括User-based CF、Item-based CF和SlopeOne推荐算法,以及Jaccard、Cosine、CosineIR、euclidean和Pearson等五种不同的相似度分析指标。为了保持这些不同 CF 算法和指标的灵活性,我们采用自定义指令集来操纵学习和预测加速器。我们在真实的 Xilinx Zynq FPGA 开发板上实现了一个硬件原型。实验结果表明,与英特尔 i7 处理器相比,所提出的学习和预测加速器可以分别实现 8.0 倍和 1.7 倍的加速。与 NVIDIA Tesla K40C GPU 相比,该加速器的能源优势高达 137.4 倍,且硬件成本合理。实验结果表明,与英特尔 i7 处理器相比,所提出的学习和预测加速器可以分别实现 8.0 倍和 1.7 倍的加速。与 NVIDIA Tesla K40C GPU 相比,该加速器的能源优势高达 137.4 倍,且硬件成本合理。实验结果表明,与英特尔 i7 处理器相比,所提出的学习和预测加速器可以分别实现 8.0 倍和 1.7 倍的加速。与 NVIDIA Tesla K40C GPU 相比,该加速器的能源优势高达 137.4 倍,且硬件成本合理。
更新日期:2020-01-01
down
wechat
bug