Way of Measuring Data Transfer Delays among Graphics Processing Units at Different Nodes of a Computer Cluster,Moscow University Computational Mathematics and Cybernetics

当前位置： X-MOL 学术 › Moscow Univ. Comput. Math. Cybern. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Way of Measuring Data Transfer Delays among Graphics Processing Units at Different Nodes of a Computer Cluster
Moscow University Computational Mathematics and Cybernetics Pub Date : 2020-05-21 , DOI: 10.3103/s0278641920010021
A. A. Begaev , A. N. Salnikov

Abstract

The basics of load tests for a computer cluster with a large number of GPUs (graphics processing units) distributed over the cluster’s nodes are presented and implemented as a program code. Information about the time delays in the transfer of data of different sizes among all GPUs in the system is collected as a result. Two modes of tests, ‘‘all to all’’ and ‘‘one to one,’’ are developed. In the first mode, all GPUs transfer data to all GPUs simultaneously. In the second mode, only the transfer between two GPUs proceeds at a single moment in time. Using test results obtained on the K60 computer cluster at the Keldysh Institute of Applied Mathematics, Russian Academy of Sciences, it is shown that the interconnector medium of the supercomputer is inhomogeneous in data transfer among the GPUs not only for transfer through the network, but also for the GPUs in a common node of the computer cluster.

中文翻译：

测量计算机集群不同节点图形处理单元之间数据传输延迟的方法

摘要

介绍了计算机集群的负载测试的基本知识，该集群具有分布在集群节点上的大量GPU（图形处理单元），并作为程序代码实现。结果，收集了有关系统中所有GPU之间不同大小的数据传输的时间延迟的信息。开发了两种测试模式，“全部”和“一对一”。在第一种模式下，所有GPU同时将数据传输到所有GPU。在第二种模式下，只有两个GPU之间的传输会在同一时间进行。使用在俄罗斯科学院凯尔迪什应用数学研究所的K60计算机集群上获得的测试结果，可以看出，超级计算机的互连介质在GPU之间的数据传输中是不均匀的，不仅是通过网络进行的，

更新日期：2020-05-21

点击分享查看原文

点击收藏

阅读更多本刊最新论文