Comparative Performance Analysis of Intel Xeon Phi, GPU, and CPU: A Case Study from Microscopy Image Analysis.,IEEE Transactions on Parallel and Distributed Systems

当前位置： X-MOL 学术 › IEEE Trans. Parallel Distrib. Syst. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Comparative Performance Analysis of Intel Xeon Phi, GPU, and CPU: A Case Study from Microscopy Image Analysis.
IEEE Transactions on Parallel and Distributed Systems ( IF 5.6 ) Pub Date : 2014-05-01 , DOI: 10.1109/ipdps.2014.111
George Teodoro ₁ , Tahsin Kurc ₂ , Jun Kong ₃ , Lee Cooper ₃ , Joel Saltz ₄

Affiliation

We study and characterize the performance of operations in an important class of applications on GPUs and Many Integrated Core (MIC) architectures. Our work is motivated by applications that analyze low-dimensional spatial datasets captured by high resolution sensors, such as image datasets obtained from whole slide tissue specimens using microscopy scanners. Common operations in these applications involve the detection and extraction of objects (object segmentation), the computation of features of each extracted object (feature computation), and characterization of objects based on these features (object classification). In this work, we have identify the data access and computation patterns of operations in the object segmentation and feature computation categories. We systematically implement and evaluate the performance of these operations on modern CPUs, GPUs, and MIC systems for a microscopy image analysis application. Our results show that the performance on a MIC of operations that perform regular data access is comparable or sometimes better than that on a GPU. On the other hand, GPUs are significantly more efficient than MICs for operations that access data irregularly. This is a result of the low performance of MICs when it comes to random data access. We also have examined the coordinated use of MICs and CPUs. Our experiments show that using a performance aware task strategy for scheduling application operations improves performance about 1.29× over a first-come-first-served strategy. This allows applications to obtain high performance efficiency on CPU-MIC systems - the example application attained an efficiency of 84% on 192 nodes (3072 CPU cores and 192 MICs).

中文翻译：

英特尔至强融核、GPU 和 CPU 的性能比较分析：显微镜图像分析案例研究。

我们研究并表征 GPU 和多集成核心 (MIC) 架构上一类重要应用程序的操作性能。我们的工作是由分析高分辨率传感器捕获的低维空间数据集的应用程序推动的，例如使用显微镜扫描仪从整个幻灯片组织标本获得的图像数据集。这些应用中的常见操作包括对象的检测和提取（对象分割）、计算每个提取的对象的特征（特征计算）以及基于这些特征的对象表征（对象分类）。在这项工作中，我们确定了对象分割和特征计算类别中操作的数据访问和计算模式。我们在现代 CPU、GPU 和 MIC 系统上系统地实施和评估这些操作的性能，以用于显微镜图像分析应用。我们的结果表明，执行常规数据访问的操作在 MIC 上的性能与在 GPU 上的性能相当，有时甚至更好。另一方面，对于不规则访问数据的操作，GPU 的效率明显高于 MIC。这是由于 MIC 在随机数据访问方面性能低下造成的。我们还研究了 MIC 和 CPU 的协调使用。我们的实验表明，使用性能感知任务策略来调度应用程序操作，比先到先服务策略的性能提高了约 1.29 倍。这使得应用程序能够在 CPU-MIC 系统上获得较高的性能效率 - 示例应用程序在 192 个节点（3072 个 CPU 内核和 192 个 MIC）上获得了 84% 的效率。

更新日期：2019-11-01

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11