当前位置: X-MOL 学术Nature › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Parallel convolutional processing using an integrated photonic tensor core
Nature ( IF 50.5 ) Pub Date : 2021-01-06 , DOI: 10.1038/s41586-020-03070-1
J Feldmann 1 , N Youngblood 2, 3 , M Karpov 4 , H Gehring 1 , X Li 2 , M Stappers 1 , M Le Gallo 5 , X Fu 4 , A Lukashchuk 4 , A S Raja 4 , J Liu 4 , C D Wright 6 , A Sebastian 5 , T J Kippenberg 4 , W H P Pernice 1, 7 , H Bhaskaran 2
Affiliation  

With the proliferation of ultrahigh-speed mobile networks and internet-connected devices, along with the rise of artificial intelligence (AI)1, the world is generating exponentially increasing amounts of data that need to be processed in a fast and efficient way. Highly parallelized, fast and scalable hardware is therefore becoming progressively more important2. Here we demonstrate a computationally specific integrated photonic hardware accelerator (tensor core) that is capable of operating at speeds of trillions of multiply-accumulate operations per second (1012 MAC operations per second or tera-MACs per second). The tensor core can be considered as the optical analogue of an application-specific integrated circuit (ASIC). It achieves parallelized photonic in-memory computing using phase-change-material memory arrays and photonic chip-based optical frequency combs (soliton microcombs3). The computation is reduced to measuring the optical transmission of reconfigurable and non-resonant passive components and can operate at a bandwidth exceeding 14 gigahertz, limited only by the speed of the modulators and photodetectors. Given recent advances in hybrid integration of soliton microcombs at microwave line rates3,4,5, ultralow-loss silicon nitride waveguides6,7, and high-speed on-chip detectors and modulators, our approach provides a path towards full complementary metal–oxide–semiconductor (CMOS) wafer-scale integration of the photonic tensor core. Although we focus on convolutional processing, more generally our results indicate the potential of integrated photonics for parallel, fast, and efficient computational hardware in data-heavy AI applications such as autonomous driving, live video processing, and next-generation cloud computing services.



中文翻译:

使用集成光子张量核心的并行卷积处理

随着超高速移动网络和互联网连接设备的普及,以及人工智能 (AI) 1的兴起,世界产生的数据量呈指数级增长,需要以快速高效的方式进行处理。因此,高度并行化、快速和可扩展的硬件变得越来越重要2。在这里,我们展示了一个计算特定的集成光子硬件加速器(张量核心),它能够以每秒数万亿次乘法累加运算的速度运行 (10 12每秒 MAC 操作数或每秒 tera-MACs)。张量核可以被认为是专用集成电路 (ASIC) 的光学模拟。它使用相变材料存储器阵列和基于光子芯片的光学频率梳(孤子微梳3)实现并行光子内存计算。计算减少到测量可重构和非谐振无源元件的光传输,并且可以在超过 14 GHz 的带宽下运行,仅受调制器和光电探测器的速度限制。鉴于最近在微波线速率3,4,5下混合集成孤子微梳的进展,超低损耗氮化硅波导6,7,以及高速片上检测器和调制器,我们的方法为光子张量核心的完全互补金属氧化物半导体(CMOS)晶圆级集成提供了一条途径。尽管我们专注于卷积处理,但更普遍地说,我们的结果表明集成光子学在数据密集型 AI 应用程序(如自动驾驶、实时视频处理和下一代云计算服务)中具有并行、快速和高效计算硬件的潜力。

更新日期:2021-01-06
down
wechat
bug