当前位置: X-MOL 学术IEEE Trans. Comput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Distributed Training of Support Vector Machine on a Multiple-FPGA System
IEEE Transactions on Computers ( IF 3.6 ) Pub Date : 2020-05-12 , DOI: 10.1109/tc.2020.2993552
Jyotikrishna Dass , Yashwardhan Narawane , Rabi Mahapatra , Vivek Sarin

Support Vector Machine (SVM) is a supervised machine learning model for classification tasks. Training SVM on a large number of data samples is challenging due to the high computational cost and memory requirement. Hence, model training is supported on a high-performance server which typically runs a sequential training algorithm on centralized data. However, as we move towards massive workloads, it will be impossible to store all the data in a centralized manner and expect such sequential training algorithms to scale on traditional processors. Moreover, with the growing demands of real-time machine learning for edge analytics, it is imperative to devise an efficient training framework with relatively cheaper computations and limited memory. Therefore, we propose and implement a first-of-its-kind system of multiple FPGAs as a distributed computing framework comprising up to eight FPGA units on Amazon F1 instances with negligible communication overhead to fully parallelize, accelerate, and scale the SVM training on decentralized data. Each FPGA unit has a pipelined SVM training IP logic core operating at 125 MHz with a power dissipation of 39 Watts for accelerating its allocated computations in the overall training process. We evaluate and compare the performance of the proposed system on five real SVM benchmarks.

中文翻译:


支持向量机在多FPGA系统上的分布式训练



支持向量机(SVM)是一种用于分类任务的监督机器学习模型。由于计算成本和内存需求较高,在大量数据样本上训练 SVM 具有挑战性。因此,模型训练在高性能服务器上得到支持,该服务器通常在集中数据上运行顺序训练算法。然而,当我们走向海量工作负载时,不可能以集中方式存储所有数据并期望这种顺序训练算法能够在传统处理器上扩展。此外,随着实时机器学习对边缘分析的需求不断增长,必须设计一种具有相对便宜的计算和有限内存的高效训练框架。因此,我们提出并实现了首个由多个 FPGA 组成的系统作为分布式计算框架,在 Amazon F1 实例上包含多达 8 个 FPGA 单元,通信开销可以忽略不计,以完全并行化、加速和扩展去中心化的 SVM 训练。数据。每个 FPGA 单元都有一个以 125 MHz 运行的流水线 SVM 训练 IP 逻辑核心,功耗为 39 瓦,可加速整个训练过程中分配的计算。我们在五个真实的 SVM 基准上评估和比较了所提出的系统的性能。
更新日期:2020-05-12
down
wechat
bug