当前位置: X-MOL 学术Real-Time Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
DeepRT: predictable deep learning inference for cyber-physical systems
Real-Time Systems ( IF 1.3 ) Pub Date : 2018-07-18 , DOI: 10.1007/s11241-018-9314-y
Woochul Kang , Jaeyong Chung

Recently, in mobile and embedded devices, deep learning is changing the way computers see, hear, and understand the world. When deep learning is deployed to such systems, they are supposed to perform inference tasks in a timely and energy-efficient manner. Lots of research has focused on taming deep learning for resource-constrained devices by either compressing deep learning models or devising hardware accelerators. However, these approaches have focused on providing ‘best-effort’ performance for such devices. In this paper, we present the design and implementation of DeepRT, a novel deep learning inference runtime. Unlike previous approaches, DeepRT focuses on supporting predictable temporal and spatial inference performance when deep learning models are used under unpredictable and resource-constrained environments. In particular, DeepRT applies formal control theory to support Quality-of-Service (QoS) management that can dynamically minimize the tardiness of inference tasks at runtime while achieving high energy-efficiency. Further, DeepRT determines a proper level of compression of deep learning models at runtime according to the memory availability and users’ QoS requirements, resulting in proper trade-offs between the memory savings and the losses of inference accuracy. We evaluate DeepRT on a wide range of deep learning models under various conditions. The experimental results show that DeepRT supports the timeliness of inference tasks in a robust and energy-efficient manner.

中文翻译:

DeepRT:网络物理系统的可预测深度学习推理

最近,在移动和嵌入式设备中,深度学习正在改变计算机看、听和理解世界的方式。当深度学习被部署到这样的系统时,它们应该及时和节能地执行推理任务。许多研究都集中在通过压缩深度学习模型或设计硬件加速器来驯服资源受限设备的深度学习。然而,这些方法专注于为此类设备提供“尽力而为”的性能。在本文中,我们介绍了 DeepRT(一种新颖的深度学习推理运行时)的设计和实现。与以前的方法不同,DeepRT 专注于在不可预测和资源受限的环境下使用深度学习模型时支持可预测的时间和空间推理性能。特别是,DeepRT 应用形式控制理论来支持服务质量 (QoS) 管理,可以在实现高能效的同时,动态地最大限度地减少运行时推理任务的延迟。此外,DeepRT 根据内存可用性和用户的 QoS 要求在运行时确定适当的深度学习模型压缩级别,从而在内存节省和推理准确性损失之间进行适当的权衡。我们在各种条件下对各种深度学习模型评估 DeepRT。实验结果表明,DeepRT 以稳健且节能的方式支持推理任务的及时性。DeepRT 根据内存可用性和用户的 QoS 要求确定运行时深度学习模型的适当压缩级别,从而在内存节省和推理精度损失之间进行适当的权衡。我们在各种条件下对各种深度学习模型评估 DeepRT。实验结果表明,DeepRT 以稳健且节能的方式支持推理任务的及时性。DeepRT 根据内存可用性和用户的 QoS 要求确定运行时深度学习模型的适当压缩级别,从而在内存节省和推理精度损失之间进行适当的权衡。我们在各种条件下对各种深度学习模型评估 DeepRT。实验结果表明,DeepRT 以稳健且节能的方式支持推理任务的及时性。
更新日期:2018-07-18
down
wechat
bug