当前位置: X-MOL 学术IEEE Trans. Comput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
PIT: Processing-In-Transmission With Fine-Grained Data Manipulation Networks
IEEE Transactions on Computers ( IF 3.6 ) Pub Date : 2020-12-30 , DOI: 10.1109/tc.2020.3048233
Pengchen Zong , Tian Xia , Haoran Zhao , Jianming Tong , Zehua Li , Wenzhe Zhao , Nanning Zheng , Pengju Ren

In the domain of data parallel computation, most works focus on data flow optimization inside the PE array and favorable memory hierarchy to pursue the maximum parallelism and efficiency, while the importance of data contents has been overlooked for a long time. As we observe, for structured data, insights on the contents (i.e., their values and locations within a structured form) can greatly benefit the computation performance, as fine-grained data manipulation can be performed. In this paper, we claim that by providing a flexible and adaptive data path, an efficient architecture with capability of fine-grained data manipulation can be built. Specifically, we design SOM, a portable and highly-adaptive data transmission network, with the capability of operand sorting, non-blocking self-route ordering and multicasting. Based on SOM, we propose the processing-in-transmission architecture (PITA), which extends the traditional SIMD architecture to perform some fundamental data processing during its transmission, by embedding multiple levels of SOM networks on the data path. We evaluate the performance of PITA in two irregular computation problems. We first map the matrix inversion task onto PITA and show considerable performance gain can be achieved, resulting in $3\times$ - $20\times$ speedup against Intel MKL, and $20\times$ - $40\times$ against cuBLAS. Then we evaluate our PITA on sparse CNNs. The results indicate that PITA can greatly improve computation efficiency and reduce memory bandwidth pressure. We achieved $2\times$ - $9\times$ speedup against several state-of-art accelerators on sparse CNN, where nearly 100 percent PE efficiency is maintained under high sparsity. We believe the concept of PIT is a promising computing paradigm that can enlarge the capability of traditional parallel architecture.

中文翻译:

PIT:使用细粒度数据处理网络进行传输中处理

在数据并行计算领域,大多数工作都集中在PE阵列内部的数据流优化和有利的内存层次结构上,以追求最大的并行性和效率,而数据内容的重要性已被长期忽略。正如我们观察到的,对于结构化数据,由于可以执行细粒度的数据操作,因此对内容的洞察力(即,它们的值和在结构化形式内的位置)可以极大地提高计算性能。在本文中,我们声称通过提供灵活的自适应数据路径,可以构建具有细粒度数据处理能力的高效体系结构。具体来说,我们设计了SOM,这是一种便携式且自适应性强的数据传输网络,具有操作数排序,无阻塞自路由排序和多播的功能。基于SOM,我们提出了传输中处理架构(PITA),该架构扩展了传统的SIMD架构,通过在数据路径中嵌入多个级别的SOM网络来在传输过程中执行一些基本的数据处理。我们在两个不规则计算问题中评估PITA的性能。我们首先将矩阵求逆任务映射到PITA上,并表明可以实现可观的性能提升,从而$ 3 \次$ -- $ 20 \次$ 针对Intel MKL的加速,以及 $ 20 \次$ -- $ 40 \次$反对cuBLAS。然后,我们在稀疏的CNN上评估我们的PITA。结果表明,PITA可以大大提高计算效率并减少内存带宽压力。我们实现了$ 2 \次$ -- $ 9 \ times $可以在稀疏的CNN上使用几种最先进的加速器加速运行,在稀疏的CNN上,稀疏CNN可以保持近100%的PE效率。我们认为PIT的概念是一种有前途的计算范例,可以扩大传统并行架构的功能。
更新日期:2020-12-30
down
wechat
bug