当前位置: X-MOL 学术IEEE Trans. Multimedia › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A 460 GOPS/W Improved-Mnemonic-Descent-Method-based Hardwired Accelerator for Face Alignment
IEEE Transactions on Multimedia ( IF 8.4 ) Pub Date : 2020-01-01 , DOI: 10.1109/tmm.2020.2993943
Huiyu Mo , Leibo LIU , Wenping Zhu , Eric Li , Shouyi Yin , Shaojun Wei

The mnemonic descent method (MDM) algorithm is the first end-to-end recurrent convolutional system for high-accuracy face alignment. However, the heavy computational complexity and high memory access demands make it difficult to satisfy the requirements of real-time applications. To address this problem, the improved MDM (I-MDM) algorithm is proposed for efficient hardware mplementation based on several hardwareoriented optimizations. First, a patch merging mechanism is introduced to dynamically cluster and eliminate redundant landmarks, which significantly reduces computational complexity with minimal accuracy loss. Second, a dedicated convolutional layer is inserted to halve the number of computations and memory access of the subsequent fully connected layer, yielding a 4.42% decrease in the failure rate. Third, a lightweight preprocessing method named dual regressors is proposed to reinitialize face images, which can greatly improve the overall accuracy. Meanwhile, compared with the similar method, the DR method can reduce computations and memory storage by nearly 99.9%. Overall and compared to the MDM algorithm, I-MDM not only reduces the number of computations by 23.5% but also decreases the failure rate by 17.9% on the 300W test set. Based on the proposed I-MDM algorithm, an I-MDM-based hardwired accelerator is presented using TSMC 65 nm CMOS process. First, compared with similar solutions, the gradient calculation operation is rearranged and loaded pixels are reused in HoG feature extraction to eliminate all division operations and 25% off-chip memory access. Second, patch-independent central activations are used to enable patch-level pipelined operations, yielding a 2x speed-up in the overall process. This accelerator achieves 460 GOPS/W energy efficiency at 330 MHz, which is 38x higher than that of the most recent face alignment accelerator with the same process.

中文翻译:

一种 460 GOPS/W 基于改进助记符下降法的硬连线加速器,用于人脸对齐

助记下降法(MDM)算法是第一个用于高精度人脸对齐的端到端循环卷积系统。然而,高计算复杂度和高内存访问需求使其难以满足实时应用的要求。为了解决这个问题,基于几种面向硬件的优化,提出了改进的 MDM (I-MDM) 算法以实现高效的硬件实现。首先,引入了补丁合并机制来动态聚类和消除冗余地标,从而以最小的精度损失显着降低计算复杂度。其次,插入一个专用的卷积层,将后续全连接层的计算次数和内存访问次数减半,从而使失败率降低 4.42%。第三,提出了一种称为双回归器的轻量级预处理方法来重新初始化人脸图像,可以大大提高整体精度。同时,与同类方法相比,DR方法可以减少近99.9%的计算量和内存存储。总体而言,与 MDM 算法相比,I-MDM 在 300W 测试集上不仅减少了 23.5% 的计算次数,而且将失败率降低了 17.9%。基于所提出的 I-MDM 算法,提出了一种使用 TSMC 65 nm CMOS 工艺的基于 I-MDM 的硬连线加速器。首先,与同类方案相比,重新排列梯度计算操作并在HoG特征提取中重用加载的像素,以消除所有除法操作和25%的片外内存访问。第二,与补丁无关的中央激活用于启用补丁级别的流水线操作,从而使整个过程的速度提高 2 倍。该加速器在 330 MHz 下实现了 460 GOPS/W 的能效,比采用相同工艺的最新面部对齐加速器高 38 倍。
更新日期:2020-01-01
down
wechat
bug