Elsevier

Nano Communication Networks

Volume 20, June 2019, Pages 31-47
Nano Communication Networks

Low-overhead thermally resilient optical network-on-chip architecture

https://doi.org/10.1016/j.nancom.2019.03.001Get rights and content

Abstract

Integrated silicon photonic networks have attracted a lot of attention in the recent decades due to their potentials for low-power and high-bandwidth communications. However, these promising networks, as the future technology, are drastically susceptible to thermal fluctuations, which may paralyze wavelength-based operation of these networks. In this regard, precise addressing of thermally induced faults in optical networks-on-chip (ONoCs), as well as revealing practical methods to tackle this challenge will be a break-even point toward implementation of this technology. In this paper, thermal variation is investigated through analyzing on-chip power distribution, which is addressed by power profile of SPEC 2006 benchmark applications. Based on these assessments, herein we propose a low-power thermal-resilient optical network-on-chip (The-RONoC) architecture that significantly mitigates routing faults in ONoC. Utilizing a corrective unit in this architecture, 50% of the thermally induced switching faults are recovered with the cost of less than 2% area overhead. In addition, up to 42% performance improvement is achieved through this architecture in comparison to the basic architecture. Finally, we explore scalability of The-RONoC based on formal SNR analysis, as well as power consumption and the probability of optical transmission speed-up.

Introduction

Network-on-chips (NoCs) as the state-of-the-art high-performance interconnection systems, pave the way toward high degree of parallelism and fast communications throughout the chip [1]. However, with the ever-increasing demand for low-power and high-bandwidth interconnection, electrical NoCs (ENoCs) meet the performance and power walls as the number of interconnected cores increases. To tackle these challenges, silicon photonic interconnections are proposed as a promising paradigm with low-power and high-bandwidth communication potentials [2], [3]. The feasibility to exploit optical NoC (ONoC) relies upon the latest achievements in CMOS fabrication process, in particular, Silicon-on-Insulator (SOI) technology [4].

Although ONoCs are the breakthrough technology advancement in high-performance systems, their wavelength-based operation is severely susceptible to temperature fluctuation [5]. Generally, temperature drifts affect all the optical components, although active components such as microring resonators (MRs) have much higher failure rate than passive components such as waveguides. The refractive index of the optical devices changes with temperature drift. In particular, the resonant wavelength of a MR, as a building block of many conventional ONoCs [6], varies with thermal variations [7]. As a result of this thermo-optical effect, the shifted resonant wavelength of the MR resulting in partial or even complete pass-band mismatch, resulting in misrouting of optical data passing through the heated MR. It is worth noting that, the resonant wavelength of a microring resonator shifts by 0.11 nm/°C, which leads to a partial, or even complete pass-band mismatch [4]. In other words, for a microring resonating at typical wavelength of 1550 nm, a temperature drift of 11.82 °C causes a resonant wavelength shift equal to 1.3 nm, which is the typical wavelength separation between two optical channels multiplexed on a waveguide, known as channel spacing. A complete mismatch between resonating wavelength of a microring and an optical data passing through it causes almost all portion of the optical power be directed through a wrong path, and almost no optical power is switched into the desired path. On the other hand, the optical power passing through the wrong path results in increased crosstalk with other optical packets modulated on the same wavelength channel. The later impact is considered while calculating SNR values for various fault scenarios. The optical power passing the wrong path results in increasing interfering other optical packets with modulated on similar wavelength. This fact is considered in SNR calculation of various studies. In this manner, the physical-level failure drastically results in network-level faults. This is not a permanent fault and can be prevented by keeping the temperature at the reasonable range, which is challenging in large and complex ONoCs.

This paper aims at assessing the potential thermally induced faults in ONoC at a higher level of abstraction, i.e. network level, to project the exclusive impact of temperature fluctuation through a more practical metric, i.e. fault rate. By pursuing the impact of thermal fluctuation on network behavior through the presented metric, we thereby propose a conceivable thermally resilient ONoC architecture, The-RONoC, which inhibits thermally-induced fault impacts by employing additional switching units. These additional units, tightened to each row of the mesh-based wavelength-routed ONoC architecture, retransmit the misrouted optical data toward the destination column. Moreover, we show that for the optical packets transmitted through an N×N The-RONoC topology, the network provides communication speed-up in comparison to its earlier version called HANoC.

In terms of performance, we show that The-RONoC architecture outweighs the previously proposed thermal aware architectures in two ways; first, measuring the temperature of surface layer and/or optical layer of the chip is not required for this architecture. In this manner, we save the implementation cost as well as power overhead of on-chip thermal sensors. Second, unlike various thermal-aware techniques, the transmitter node of the misrouted optical data in The-RONoC architecture neither waits for a timeout to reach or an ack/nack packet to arrive, nor retransmits the data in an alternative path or an alternative wavelength. By this means, aside from simpler and faster retransmission mechanism obtained in The-RONoC architecture, it gains from saving the power imposed by additional opto-electrical modulation for retransmission, as well as a wider bandwidth laser source. The main contributions of this paper are summarized as follows:

  • We utilize a precise heat conduction model for multilayer ONoC [8], [9] validated by HotSpot thermal simulation tool [10] in this paper. Through this precise model, we evaluate thermally-induced faults in two wavelength-routed architectures, i.e. mesh-based High-performance All-optical NoC (HANoC) architecture, and a novel thermally resilient variant of HANoC, The-RONoC.

  • We propose a power-efficient The-RONoC architecture that retransmits misrouted optical data toward the destination column with low power penalties. The complexity of the optical router design is not increased and the design of the retransmission unit, as well as its layout, are simple and straightforward. As we demonstrate, the reliability improvement presented by this architecture is provided at the expense of less than 2% area overhead.

  • We propose fault rate as a system-level criterion that exclusively focuses on the impact of temperature fluctuation at a practical standpoint for the designers.

  • We formally evaluate area overhead, system throughput, power consumption and transmission delay for The-RONoC architecture. The results reveal that the proposed architecture is a low-overhead and low-power ONoC.

  • In order to determine the scalability of The-RONoC architecture, we explore the worst-case path from the SNR perspective. In this manner, a novel hierarchical method is revealed which finds the minimum SNR in the network.

  • To emphasize the impact of applications on network fault rate, we employ two categories of applications from SPEC 2006 benchmark applications [11], extract the corresponding power consumption of the processing cores, and adopt them as the heat source in the electrical layer of the chip. We address dynamic fluctuation of temperature throughout the chip, as a result of run-time power variation of processing cores.

The rest of the paper is organized as follows. First, Section 2 reviews related literature addressing the impact of thermal fluctuations in ONoCs. In Section 3, we analyze thermo-optical properties of the on-chip optical devices. In Section 4 we propose our thermally-resilient architecture and reveal its pros and cons. In Section 5, we analyze the system performance through analytical parameters. In Section 6, we explain the simulation environment and the simulation flow adopted throughout the paper. Also, various fault scenarios and simulation results are revealed in this section. Finally, the paper is concluded in Section 7.

Section snippets

Related work

Temperature fluctuation acts as a boundary of pushing ONoC beyond an elegant research concept. In this regard, various research addresses the impact of thermally induced faults by mitigating thermal susceptibility of optical communication architectures, generally dedicated in two levels; i.e. device level and system level. At the device level, various research has been devoted to athermal devices, such as polymer cladding for optical devices [12], [13]. Although the aforementioned studies play

Thermal characteristics of multilayer chips

Thermal resistance, as a key characteristic of the heat conducting medium, can be deduced from Fourier’s law of heat conduction [26]. Analyzing this law reveals that thermal resistance, Rcond, depends on the system’s geometrical and physical characteristics as follows: Rcond=Lk.x(2Ltanθ+x)where, k is the thermal conductivity of the conductingmedium, x is the thickness of heat conducting medium, and L is the dimension of the heat generating source, which is the processing core in this study.

The-RONoC architecture

In this section, we address thermal resiliency in ONoC, and propose our novel Thermal Resilient Optical NoC architecture named The-RONoC as a high-performance and fast solution in this regard.

Network analysis

In this section, performance and scalability of The-RONoC architecture is elaborated.

Simulation results

In this section, we evaluate the reliability of fault resilient architecture under temperature variations raised by heat generated by electrical cores.

Conclusion

In this paper, we evaluated the impact of thermally induced faults in terms of fault rate in optical NoCs. Utilizing our developed electrical model, we analyzed the reliability of ONoC architectures by addressing the impact of heated cores on wavelength-based operation of optical routers. To tackle thermally induced faults, we introduced a novel mesh-based all-optical thermally resilient architecture, called The-RONoC, which is enriched by corrective units to redirect erroneously routed optical

Melika Tinati received her Ph.D. degree in the Department of Computer Engineering at Sharif University of Technology, Tehran, Iran in Computer Engineering in 2017. Her current research is focused on the design and analysis of on-chip silicon photonic networks, fault tolerant optical networks.

References (46)

  • M. Tinati, S. Koohi, S. Hessabi, Impact of on-chip power distribution on Temperature-Induced Faults in Optical NoCs,...
  • W. Huangy, M.R. Stany, K. Skadronz, K. Sankaranarayananz, S. Ghoshyz, S. Velusamyz, Compact thermal modeling for...
  • HenningJ.L.

    Spec cpu 2006 benchmark descriptions

    ACM SIGARCH Comput. Archit. News

    (2006)
  • TengJ. et al.

    Athermal Silicon-on-insulator ring resonators by overlaying a polymer cladding on narrowed waveguides

    Opt. Express

    (2009)
  • DjordjevicS.S.

    CMOS-compatible, athermal silicon ring modulators clad with titanium dioxide

    Opt. Express

    (2013)
  • WangM. et al.

    Wavelength reconfigurable photonic switching using thermally tuned micro-ring resonators fabricated on silicon substrate

    Proc. of SPIE: Nanoeng. Fabricat., Propert., Opt., Dev. IV

    (2007)
  • H. Li, A. Fourmigue, S. Le Beux, X. Letartre, I. O’Connor, G. Nicolescu, Thermal aware design method for VCSEL-based...
  • N. Kirman, M. Kirman, R.K. Dokania, J.F. Martinez, A.B. Apsel, M.A. Watkins, D.H. Albonesi, Leveraging optical...
  • LiZ. et al.

    Aurora: A cross-layer solution for thermally resilient photonic network-on-chip

    IEEE Trans. Very Large Scale Integr. (VLSI) Syst.

    (2015)
  • AbellánJ.L. et al.

    Adaptive tuning of photonic devices in a photonic noc through dynamic workload allocation

    IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst.

    (2017)
  • D.M. Calhoun, Q. Li, C. Browning, N.C. Abrams, Y. Liu, R. Ding, L.P. Barry, T. Baehr-Jones, M. Hochberg, K. Bergman,...
  • M.C. Meyer, A.B. Ahmed, Y. Okuyama, A.B. Abdallah, FTTDOR: Microring fault-resilient optical router for reliable...
  • C. Nitta, M. Farrens, V. Akella, Addressing system-level trimming issue in on-chip nanophotonic networks, in: Proc. of...
  • Cited by (11)

    • THAMON: Thermal-aware High-performance Application Mapping onto Opto-electrical network-on-chip

      2021, Journal of Systems Architecture
      Citation Excerpt :

      In [36], failure rate due to TV with system-level fault model through task mapping and chip power distribution is discussed. In [37], the authors proposed an all-optical thermal resilient communication mechanism which increase the reliability of the system in conjunction with the system performance. Finally, the authors in [38] proposed a framework called “RONoC” consisting of a correction unit which can improve the failure rate of MR switching due to thermal variation about 50% with 42% of performance increase and only 2% of hardware-overhead in comparison with the basic architecture.

    • Vulnerability assessment of fault-tolerant optical network-on-chips

      2020, Journal of Parallel and Distributed Computing
      Citation Excerpt :

      A mesh-based fault-aware routing algorithm was proposed in this paper which intends to decrease the thermal variation across the chip using a traffic-aware strategy to distribute the workload and avoid using specific nodes. Tinati et al. [55] investigated the effect of on-chip thermal variation based on the power distribution of task execution. They also proposed “RONoC” architecture as a low-power thermal-resilient ONoC, which mitigates routing faults.

    • Exploring and evaluating reliable communication in optical networks-on-chip using group counting method

      2020, Nano Communication Networks
      Citation Excerpt :

      Nowadays, the wavelength division multiplexing (WDM) technology is applied to ONoCs to meet the demand of higher communication bandwidth for ultra large-capacity high-speed transmission. However, both of them inevitably suffer from the intrinsic loss and crosstalk noise accumulation due to the material properties of optical devices and their manufacturing process [9–12]. They all restrict the performance improvement of ONoCs and seriously degrade the communication reliability, such as increase of bit error rate (BER), reduction of optical signal-to-noise ratio (OSNR), and Q factor (Q).

    • Gaussian-based optical networks-on-chip: Performance analysis and optimization

      2020, Nano Communication Networks
      Citation Excerpt :

      Most of the above researches are based on electrical interconnected on-chip Gaussian networks. Compared with the traditional electrical networks-on-chip (ENoCs), ONoCs have advantages of higher bandwidth, lower latency and lower power consumption [7–10], which are more suitable for the developments of integrating more processor cores on a single chip [11–14]. As a result, ONoCs are crucial for the development of the future multi-core systems on-chip [15–18].

    • Universal method for constructing fault-tolerant optical routers using RRW

      2021, Wireless Communications and Mobile Computing
    View all citing articles on Scopus

    Melika Tinati received her Ph.D. degree in the Department of Computer Engineering at Sharif University of Technology, Tehran, Iran in Computer Engineering in 2017. Her current research is focused on the design and analysis of on-chip silicon photonic networks, fault tolerant optical networks.

    Somayyeh Koohi received her B.Sc. double degree from Sharif University of Technology, Tehran, Iran in Electrical Engineering and Computer Engineering in 2005. She then received her M.Sc. and Ph.D. degrees from Sharif University of Technology in Computer Engineering in 2007 and 2012, respectively. In 2013, she joined Sharif University of Technology as an assistant professor. Her research interests include design and analysis of on-chip optical interconnects, low power design, design of network-on-chips for future high performance multi-processor systems, and optical network-on-chip as a novel solution for future systems-on-chip.

    Shaahin Hessabi received his B.Sc. and M.Sc. degrees in Electrical Engineering from Sharif University of Technology, Tehran, Iran in 1986 and 1990, respectively. He received his Ph.D. in Electrical and Computer Engineering from University of Waterloo, Waterloo, Ontario, Canada in 1995. He joined Sharif University of Technology in 1996, and is currently an associate professor in the Department of Computer Engineering. His current research interests include testing and design for testability, VLSI design, SoC and NoC, and reconfigurable systems.

    View full text