当前位置: X-MOL 学术J. Electron. Imaging › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Efficient graphical-processor-unit parallelization algorithm for computing Eigen values
Journal of Electronic Imaging ( IF 1.1 ) Pub Date : 2020-12-02 , DOI: 10.1117/1.jei.29.6.063008
Sofien Ben Sayadia 1 , Yaroub Elloumi 1 , Mohamed Akil 1 , Mohamed Hedi Bedoui 2
Affiliation  

Abstract. Several leading-edge applications such as pathology detection, biometric identification, and face recognition are based mainly on blob and line detection. To address this problem, Eigen value computing has been commonly employed due to its accuracy and robustness. However, Eigen value computing requires a raised computational processing, intensive memory data access, and data overlapping, which involve higher execution times. To overcome these limitations, we propose in this paper a new parallel strategy to implement Eigen value computing using a graphics processing unit (GPU). Our contributions are (1) to optimize instruction scheduling to reduce the computation time, (2) to efficiently partition processing into blocks to increase the occupancy of streaming multiprocessors, (3) to provide efficient input data splitting on shared memory to benefit from its lower access time, and (4) to propose new data management of shared memory to avoid access memory conflict and reduce memory bank accesses. Experimental results show that our proposed GPU parallel strategy for Eigen value computing achieves speedups of 27 compared with a multithreaded implementation, of 16 compared with a predefined function in the OpenCV library, and of eight compared with a predefined function in the Cublas library, all of which are performed into a quad core multi-central-processing unit platform. Next, our parallel strategy is evaluated through an Eigen value-based method for retinal thick vessel segmentation, which is essential for detecting ocular pathologies. Eigen value computing is executed in 0.017 s when using Structured Analysis of the Retina database images. Accordingly, we achieved real-time thick retinal vessel segmentation with an average execution time of about 0.039 s.

中文翻译:

用于计算特征值的高效图形处理器单元并行化算法

摘要。病理检测、生物特征识别和人脸识别等几个前沿应用主要基于斑点和线条检测。为了解决这个问题,特征值计算因其准确性和鲁棒性而被普遍采用。然而,特征值计算需要更高的计算处理、密集的内存数据访问和数据重叠,这涉及更高的执行时间。为了克服这些限制,我们在本文中提出了一种新的并行策略,以使用图形处理单元 (GPU) 来实现特征值计算。我们的贡献是(1)优化指令调度以减少计算时间,(2)有效地将处理划分为块以增加流式多处理器的占用率,(3) 在共享内存上提供高效的输入数据拆分,以受益于其较低的访问时间,以及 (4) 提出新的共享内存数据管理,以避免访问内存冲突并减少内存库访问。实验结果表明,我们提出的用于特征值计算的 GPU 并行策略与多线程实现相比实现了 27 倍的加速,与 OpenCV 库中的预定义函数相比实现了 16 倍的加速,与 Cublas 库中的预定义函数相比实现了 8 倍的加速,所有这些被执行到一个四核多中央处理器平台。接下来,我们的并行策略通过基于特征值的视网膜粗血管分割方法进行评估,这对于检测眼部病变至关重要。特征值计算在 0 中执行。使用 Retina 数据库图像的结构化分析时为 017 秒。因此,我们实现了实时粗视网膜血管分割,平均执行时间约为 0.039 秒。
更新日期:2020-12-02
down
wechat
bug