当前位置: X-MOL 学术IEEE Trans. Comput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Machine Learning Computers with Fractal von Neumann Architecture
IEEE Transactions on Computers ( IF 3.7 ) Pub Date : 2020-07-01 , DOI: 10.1109/tc.2020.2982159
Yongwei Zhao , Zhe Fan , Zidong Du , Tian Zhi , Ling Li , Qi Guo , Shaoli Liu , Zhiwei Xu , Tianshi Chen , Yunji Chen

Machine learning techniques are pervasive tools for emerging commercial applications and many dedicated machine learning computers on different scales have been deployed in embedded devices, servers, and data centers. Currently, most machine learning computer architectures still focus on optimizing performance and energy efficiency instead of programming productivity. However, with the fast development in silicon technology, programming productivity, including programming itself and software stack development, becomes the vital reason instead of performance and power efficiency that hinders the application of machine learning computers. In this article, we propose Cambricon-F, which is a series of homogeneous, sequential, multi-layer, layer-similar, and machine learning computers with same ISA. A Cambricon-F machine has a fractal von Neumann architecture to iteratively manage its components: it is with von Neumann architecture and its processing components (sub-nodes) are still Cambricon-F machines with von Neumann architecture and the same ISA. Since different Cambricon-F instances with different scales can share the same software stack on their common ISA, Cambricon-Fs can significantly improve the programming productivity. Moreover, we address four major challenges in Cambricon-F architecture design, which allow Cambricon-F to achieve a high efficiency. We implement two Cambricon-F instances at different scales, i.e., Cambricon-F100 and Cambricon-F1. Compared to GPU based machines (DGX-1 and 1080Ti), Cambricon-F instances achieve 2.82x, 5.14x better performance, 8.37x, 11.39x better efficiency on average, with 74.5, 93.8 percent smaller area costs, respectively. We further propose Cambricon-FR, which enhances the Cambricon-F machine learning computers to flexibly and efficiently support all the fractal operations with a reconfigurable fractal instruction set architecture. Compared to the Cambricon-F instances, Cambricon-FR machines achieve 1.96x, 2.49x better performance on average. Most importantly, Cambricon-FR computers are able to save the code length with a factor of 5.83, thus significantly improving the programming productivity.

中文翻译:

具有分形冯诺依曼架构的机器学习计算机

机器学习技术是新兴商业应用的普遍工具,许多不同规模的专用机器学习计算机已部署在嵌入式设备、服务器和数据中心。目前,大多数机器学习计算机架构仍然专注于优化性能和能源效率,而不是编程生产力。然而,随着硅技术的快速发展,编程生产力,包括编程本身和软件堆栈开发,成为阻碍机器学习计算机应用的重要原因,而不是性能和功率效率。在本文中,我们提出了 Cambricon-F,它是一系列具有相同 ISA 的同构、顺序、多层、层相似和机器学习计算机。Cambricon-F 机器具有分形冯诺依曼架构以迭代管理其组件:它采用冯诺依曼架构,其处理组件(子节点)仍然是具有冯诺依曼架构和相同 ISA 的 Cambricon-F 机器。由于不同规模的不同 Cambricon-F 实例可以在其公共 ISA 上共享相同的软件堆栈,因此 Cambricon-Fs 可以显着提高编程效率。此外,我们解决了 Cambricon-F 架构设计中的四大挑战,这使得 Cambricon-F 实现了高效率。我们实现了两个不同规模的 Cambricon-F 实例,即 Cambricon-F100 和 Cambricon-F1。与基于 GPU 的机器(DGX-1 和 1080Ti)相比,Cambricon-F 实例的性能提高了 2.82 倍、5.14 倍,平均效率提高了 8.37 倍、11.39 倍,分别为 74.5、93。面积成本分别降低了 8%。我们进一步提出了 Cambricon-FR,它增强了 Cambricon-F 机器学习计算机,以灵活有效地支持所有具有可重构分形指令集架构的分形操作。与 Cambricon-F 实例相比,Cambricon-FR 机器的平均性能提高了 1.96 倍、2.49 倍。最重要的是,Cambricon-FR计算机能够将代码长度节省5.83倍,从而显着提高编程效率。平均性能提高 49 倍。最重要的是,Cambricon-FR计算机能够将代码长度节省5.83倍,从而显着提高编程效率。平均性能提高 49 倍。最重要的是,Cambricon-FR计算机能够将代码长度节省5.83倍,从而显着提高编程效率。
更新日期:2020-07-01
down
wechat
bug