当前位置: X-MOL 学术IEEE Trans. Ind. Inform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Image Autoregressive Interpolation Model Using GPU-Parallel Optimization
IEEE Transactions on Industrial Informatics ( IF 11.7 ) Pub Date : 2017-07-07 , DOI: 10.1109/tii.2017.2724205
Jiaji Wu , Long Deng , Gwanggil Jeon

With the growth in the consumer electronics industry, it is vital to develop an algorithm for ultrahigh definition products that is more effective and has lower time complexity. Image interpolation, which is based on an autoregressive model, has achieved significant improvements compared with the traditional algorithm with respect to image reconstruction, including a better peak signal-to-noise ratio (PSNR) and improved subjective visual quality of the reconstructed image. However, the time-consuming computation involved has become a bottleneck in those autoregressive algorithms. Because of the high time cost, image autoregressive-based interpolation algorithms are rarely used in industry for actual production. In this study, in order to meet the requirements of real-time reconstruction, we use diverse compute unified device architecture (CUDA) optimization strategies to make full use of the graphics processing unit (GPU) (NVIDIA Tesla K80), including a shared memory and register and multi-GPU optimization. To be more suitable for the GPU-parallel optimization, we modify the training window to obtain a more concise matrix operation. Experimental results show that, while maintaining a high PSNR and subjective visual quality and taking into account the I/O transfer time, our algorithm achieves a high speedup of 147.3 times for a Lena image and 174.8 times for a 720p video, compared to the original single-threaded C CPU code with -O2 compiling optimization.

中文翻译:


使用 GPU 并行优化的图像自回归插值模型



随着消费电子行业的发展,开发一种更有效、时间复杂度更低的超高清产品算法至关重要。基于自回归模型的图像插值在图像重建方面较传统算法取得了显着的改进,包括更好的峰值信噪比(PSNR)和改善重建图像的主观视觉质量。然而,所涉及的耗时计算已成为这些自回归算法的瓶颈。由于时间成本较高,基于图像自回归的插值算法在工业实际生产中很少使用。在本研究中,为了满足实时重建的要求,我们采用多种计算统一设备架构(CUDA)优化策略来充分利用图形处理单元(GPU)(NVIDIA Tesla K80),包括共享内存以及寄存器和多 GPU 优化。为了更适合GPU并行优化,我们修改训练窗口以获得更简洁的矩阵运算。实验结果表明,在保持高 PSNR 和主观视觉质量并考虑 I/O 传输时间的同时,与原始算法相比,我们的算法对 Lena 图像实现了 147.3 倍的加速,对 720p 视频实现了 174.8 倍的加速经过 -O2 编译优化的单线程 C CPU 代码。
更新日期:2017-07-07
down
wechat
bug