Latency‐aware adaptive micro‐batching techniques for streamed data compression on graphics processing units,Concurrency and Computation: Practice and Experience

当前位置： X-MOL 学术 › Concurr. Comput. Pract. Exp. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Latency‐aware adaptive micro‐batching techniques for streamed data compression on graphics processing units
Concurrency and Computation: Practice and Experience ( IF 1.5 ) Pub Date : 2020-05-04 , DOI: 10.1002/cpe.5786
Charles M. Stein ₁ , Dinei A. Rockenbach _{1,

2} , Dalvan Griebler _{1,

2} , Massimo Torquati ₃ , Gabriele Mencagli ₃ , Marco Danelutto ₃ , Luiz G. Fernandes ₂

Affiliation

Stream processing is a parallel paradigm used in many application domains. With the advance of graphics processing units (GPUs), their usage in stream processing applications has increased as well. The efficient utilization of GPU accelerators in streaming scenarios requires to batch input elements in microbatches, whose computation is offloaded on the GPU leveraging data parallelism within the same batch of data. Since data elements are continuously received based on the input speed, the bigger the microbatch size the higher the latency to completely buffer it and to start the processing on the device. Unfortunately, stream processing applications often have strict latency requirements that need to find the best size of the microbatches and to adapt it dynamically based on the workload conditions as well as according to the characteristics of the underlying device and network. In this work, we aim at implementing latency‐aware adaptive microbatching techniques and algorithms for streaming compression applications targeting GPUs. The evaluation is conducted using the Lempel‐Ziv‐Storer‐Szymanski compression application considering different input workloads. As a general result of our work, we noticed that algorithms with elastic adaptation factors respond better for stable workloads, while algorithms with narrower targets respond better for highly unbalanced workloads.

中文翻译：

延迟感知的自适应微批处理技术，用于图形处理单元上的流数据压缩

流处理是在许多应用程序领域中使用的并行范例。随着图形处理单元（GPU）的发展，它们在流处理应用程序中的使用也有所增加。流场景中GPU加速器的有效利用要求将批处理中的输入元素分批处理，利用同一批数据中的数据并行性将其计算工作转移到GPU上。由于数据元素是根据输入速度连续接收的，因此微批处理的大小越大，完全缓冲数据并在设备上开始处理的等待时间就越大。很遗憾，流处理应用程序通常具有严格的延迟要求，这些要求需要找到最佳的微型批次，并根据工作负载条件以及基础设备和网络的特性动态调整它。在这项工作中，我们旨在为针对GPU的流压缩应用程序实现可感知延迟的自适应微批处理技术和算法。评估是使用考虑到不同输入工作负载的Lempel-Ziv-Storer-Szymanski压缩应用程序进行的。作为我们工作的总体结果，我们注意到具有弹性适应因子的算法对稳定的工作负载响应更好，而目标较窄的算法对高度不平衡的工作负载响应更好。我们旨在为针对GPU的流压缩应用程序实施可感知延迟的自适应微批处理技术和算法。考虑到不同的输入工作负载，使用Lempel-Ziv-Storer-Szymanski压缩应用程序进行评估。作为我们工作的总体结果，我们注意到具有弹性适应因子的算法对稳定的工作负载响应更好，而目标较窄的算法对高度不平衡的工作负载响应更好。我们旨在为针对GPU的流压缩应用程序实施可感知延迟的自适应微批处理技术和算法。评估是使用考虑到不同输入工作负载的Lempel-Ziv-Storer-Szymanski压缩应用程序进行的。作为我们工作的总体结果，我们注意到具有弹性适应因子的算法对稳定的工作负载响应更好，而目标较窄的算法对高度不平衡的工作负载响应更好。

更新日期：2020-05-04

点击分享查看原文

点击收藏

阅读更多本刊最新论文