当前位置: X-MOL 学术arXiv.cs.PF › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Memory-Efficient Deep Learning Inference in Trusted Execution Environments
arXiv - CS - Performance Pub Date : 2021-04-30 , DOI: arxiv-2104.15109
Jean-Baptiste Truong, William Gallagher, Tian Guo, Robert J. Walls

This study identifies and proposes techniques to alleviate two key bottlenecks to executing deep neural networks in trusted execution environments (TEEs): page thrashing during the execution of convolutional layers and the decryption of large weight matrices in fully-connected layers. For the former, we propose a novel partitioning scheme, y-plane partitioning, designed to (ii) provide consistent execution time when the layer output is large compared to the TEE secure memory; and (ii) significantly reduce the memory footprint of convolutional layers. For the latter, we leverage quantization and compression. In our evaluation, the proposed optimizations incurred latency overheads ranging from 1.09X to 2X baseline for a wide range of TEE sizes; in contrast, an unmodified implementation incurred latencies of up to 26X when running inside of the TEE.

中文翻译:

可信执行环境中的内存高效深度学习推理

这项研究确定并提出了缓解在可信执行环境(TEE)中执行深度神经网络的两个关键瓶颈的技术:卷积层执行期间的页面抖动以及完全连接层中的大型权矩阵的解密。对于前者,我们提出了一种新颖的分区方案y平面分区,该方案旨在(ii)与TEE安全存储器相比,当层输出较大时,提供一致的执行时间;(ii)显着减少卷积层的内存占用。对于后者,我们利用量化和压缩。在我们的评估中,针对各种TEE尺寸,建议的优化会产生1.09倍至2倍基线的延迟开销;相比之下,未经修改的实现在TEE内部运行时会产生高达26倍的延迟。
更新日期:2021-05-03
down
wechat
bug