当前位置: X-MOL 学术ACM Trans. Archit. Code Optim. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Exploiting Parallelism Opportunities with Deep Learning Frameworks
ACM Transactions on Architecture and Code Optimization ( IF 1.5 ) Pub Date : 2020-12-30 , DOI: 10.1145/3431388
Yu Emma Wang 1 , Carole-Jean Wu 2 , Xiaodong Wang 2 , Kim Hazelwood 2 , David Brooks 1
Affiliation  

State-of-the-art machine learning frameworks support a wide variety of design features to enable a flexible machine learning programming interface and to ease the programmability burden on machine learning developers. Identifying and using a performance-optimal setting in feature-rich frameworks, however, involves a non-trivial amount of performance profiling efforts and often relies on domain-specific knowledge. This article takes a deep dive into analyzing the performance impact of key design features in a machine learning framework and quantifies the role of parallelism. The observations and insights distill into a simple set of guidelines that one can use to achieve much higher training and inference speedup. Across a diverse set of real-world deep learning models, the evaluation results show that the proposed performance tuning guidelines outperform the Intel and TensorFlow recommended settings by 1.30× and 1.38×, respectively.

中文翻译:

利用深度学习框架利用并行机会

最先进的机器学习框架支持各种设计功能,以实现灵活的机器学习编程接口并减轻机器学习开发人员的可编程负担。然而,在功能丰富的框架中识别和使用性能最优设置涉及大量的性能分析工作,并且通常依赖于特定领域的知识。本文深入分析了机器学习框架中关键设计特性对性能的影响,并量化了并行性的作用。观察和见解提炼成一组简单的指导方针,可以用来实现更高的训练和推理加速。在一系列不同的真实世界深度学习模型中,
更新日期:2020-12-30
down
wechat
bug