Skip to main content
Log in

Fault-Tolerant Mesh-Based NoC with Router-Level Redundancy

  • Published:
Journal of Signal Processing Systems Aims and scope Submit manuscript

Abstract

The aggressively scaled CMOS technology is increasingly threatening the dependability of network-on-chips (NoCs) architecture. In a mesh-based NoC, a faulty router or broken link may isolate a well functional processing element (PE). Also, a set of faulty routers may form isolated regions, which can degrade the design. In this paper, we propose a router-level redundancy (RLR) fault-tolerant scheme that differs from the traditional microarchitecture-level redundancy (MLR) approach to relieve the problem of isolated PE and isolated region. By simply adding one spare router within each router set in a mesh, RLR can be created and connection paths between adjacent routers can be diversified. To exploit this extra resource, two reconfiguration algorithms are demonstrated to detour observed faulty routers/links. The proposed RLR fault-tolerant scheme can tolerate at most one faulty router within a router set. After the reconfiguration, the original mesh topology is maintained. As a result, the proposed architecture does not need any support from the network layer routing algorithms. The scheme has been evaluated based on the three fault-tolerant metrics: reliability, mean time to failure (MTTF), and yield. The experimental results show that the performance RLR increases as the size of NoC grows; however, the relative connection cost decreases at the same time. This characteristic makes our architecture suitable for large-scale NoC designs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8

Similar content being viewed by others

References

  1. Sodani, A., Gramunt, R., Corbal, J., Kim, H.-S., Vinod, K., Chinthamani, S., Hutsell, S., Agarwal, R., Liu, Y.-C. (2016). Knights landing: Second-generation intel xeon phi product. IEEE Micro, 36 (2), 34–46.

    Article  Google Scholar 

  2. Davidson, S., Xie, S., Torng, C., Al-Hawai, K., Rovinski, A., Ajayi, T., Vega, L., Zhao, C., Zhao, R., Dai, S., Amarnath, A., Veluri, B., Gao, P., Rao, A., Liu, G., Gupta, R.K., Zhang, Z., Dreslinski, R., Batten, C., Taylor, M.B. (2018). The celerity open-source 511-Core RISC-V tiered accelerator fabric: fast architectures and design methodologies for fast chips. IEEE Micro, 38(2), 30–41.

    Article  Google Scholar 

  3. Chen, Y.-H., Yang, T.-J., Emer, J., Sze, V. (2018). Eyeriss v2: a flexible accelerator for emerging deep neural networks on mobile devices. arXiv:1807.07928.

  4. Akopyan, F., Sawada, J., Cassidy, A., Alvarez-Icaza, R., Arthur, J., Merolla, P., Imam, N., Nakamura, Y., Datta, P., Nam, G.-J., et al. (2015). Truenorth: Design and tool flow of a 65 mw 1 million neuron programmable neurosynaptic chip. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 34(10), 1537–1557.

    Article  Google Scholar 

  5. Jerger, N.E., & Peh, L.-S. (2009). On-chip networks. Synthesis Lectures on Computer Architecture, 4(1), 1–141.

    Article  Google Scholar 

  6. Gaur, M.S., Laxmi, V., Zwolinski, M., Kumar, M., Gupta, N., Ashish. (2015). Network-on-chip: Current issues and challenges. In 2015 19th international symposium on VLSI design and test (pp. 1–3).

  7. Ansari, A.Q., Ansari, M.R., Khan, M.A. (2015). Performance evaluation of various parameters of Network-on-Chip (NoC) for different topologies. In 2015 annual IEEE India conference (INDICON) (pp. 1–4).

  8. Wang, Z., Liu, W., Xu, J., Li, B., Iyer, R., Illikkal, R., Wu, X., Mow, W.H., Ye, W. (2014). A case study on the communication and computation behaviors of real applications in NoC-based MPSoCs. In 2014 IEEE computer society annual symposium on VLSI (pp. 480–485).

  9. Wang, Z., Xu, J., Wu, X., Ye, Y., Zhang, W., Nikdast, M., Wang, X., Wang, Z. (2014). Floorplan optimization of fat-tree-based networks-on-chip for chip multiprocessors. IEEE Transactions on Computers, 63(6), 1446–1459.

    Article  MathSciNet  Google Scholar 

  10. Loucif, S. (2013). Performance evaluation of hierarchical-torus NoC. In 2013 27th international conference on advanced information networking and applications workshops (pp. 837–842): IEEE.

  11. El-Moursy, M.A., Korzec, D., Ismail, M., et al. (2009). High throughput architecture for OCTAGON network on chip. In 2009 16th IEEE international conference on electronics, circuits and systems-(ICECS 2009) (pp. 101–104): IEEE.

  12. Du, Z., Fasthuber, R., Chen, T., Ienne, P., Li, L., Luo, T., Feng, X., Chen, Y., Temam, O. (2015). ShiDianNao: Shifting vision processing closer to the sensor. In ACM SIGARCH computer architecture news, (Vol. 43 pp. 92–104): ACM.

  13. Constantinescu, C. (2003). Trends and challenges in VLSI circuit reliability. Micro, IEEE, 23(4), 14–19.

    Article  Google Scholar 

  14. Polian, I., Hayes, J.P., Reddy, S.M., Becker, B. (2011). Modeling and mitigating transient errors in logic circuits. IEEE Transactions on Dependable and Secure Computing, 8(4), 537– 547.

    Article  Google Scholar 

  15. Braga, M., Cota, E., Kastensmidt, F.L., Lubaszewski, M. (2010). Efficiently using data splitting and retransmission to tolerate faults in networks-onchip interconnects. In Proceedings of 2010 IEEE international symposium on circuits and systems (ISCAS) (pp. 4101–4104).

  16. Poluri, P., & Louri, A. (2014). A soft error tolerant network-on-chip router pipeline for multi-core systems. IEEE Computer Architecture Letters, 14(2), 107–110.

    Article  Google Scholar 

  17. Yu, Q., Zhang, M., Ampadu, P. (2011). Exploiting inherent information redundancy to manage transient errors in NoC routing arbitration. Pittsburgh, Pennsylvania, 105–112.

  18. Chen, X., Lu, Z., Lei, Y., Wang, Y., Chen, S. (2016). Multi-bit transient fault control for NoC links using 2D fault coding method. In 2016 tenth IEEE/ACM international symposium on Networks-on-Chip (NOCS) (pp. 1–8): IEEE.

  19. Chang, Y.-C., Chiu, C.-T., Lin, S.-Y., Liu, C.-K. (2011). On the design and analysis of fault tolerant NoC architecture using spare routers. In Proceedings of the 16th Asia and South Pacific design automation conference (pp. 431–436): IEEE Press.

  20. Li, C., Yang, M., Ampadu, P. (2016). An energy-efficient noc router with adaptive fault-tolerance using channel slicing and on-demand tmr. IEEE Transactions on Emerging Topics in Computing, 6(4), 538–550.

    Article  Google Scholar 

  21. Constantinides, K., Plaza, S., Blome, J., Bin, Z., Bertacco, V., Mahlke, S., Austin, T., Orshansky, M. (2006). BulletProof: a defect-tolerant CMP switch architecture. In The twelfth international symposium on high-performance computer architecture, 2006 (pp. 5–16).

  22. Xie, L., Mei, K., Li, Y. (2013). Repair: a reliable partial-redundancybased router in NoC. In 2013 IEEE eighth international conference on networking, architecture and storage (pp. 173–177): IEEE.

  23. Fick, D., DeOrio, A., Jin, H., Bertacco, V., Blaauw, D., Sylvester, D. (2009). Vicis: a reliable network for unreliable silicon. In Design automation conference, 2009. DAC ’09. 46th ACM/IEEE (pp. 812–817).

  24. Sung-Jui, P., & Kwang-Ting, C. (2007). A framework for system reliability analysis considering both system error tolerance and component test quality. In Design, automation & test in europe conference & exhibition, 2007. DATE ’07 (pp. 1–6).

  25. Lehtonen, T., Wolpert, D., Liljeberg, P., Plosila, J., Ampadu, P. (2010). Self-adaptive system for addressing permanent errors in on-chip interconnects. Very Large Scale Integration (VLSI) Systems, IEEE Transactions on, 18(4), 527–540.

    Article  Google Scholar 

  26. Kia, H.S., & Ababei, C. (2011). Improving fault tolerance of Network-on-Chip links via minimal redundancy and reconfiguration. In 2011 international conference on reconfigurable computing and FPGAs (ReConFig) (pp. 363–368).

  27. Chatterjee, N., Chattopadhyay, S., Manna, K. (2014). A spare router based reliable network-on-chip design. In 2014 IEEE international symposium on circuits and systems (ISCAS) (pp. 1957–1960): IEEE.

  28. Cheng, L., Lei, Z., Yinhe, H., Xiaowei, L. (2011). A resilient on-chip router design through data path salvaging. In 2011 16th Asia and South Pacific design automation conference (ASP-DAC) (pp. 437–442).

  29. Chen, C., Fu, Y., Cotofana, S. (2016). Towards maximum utilization of remained bandwidth in defected NoC links. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 36(2), 285–298.

    Article  Google Scholar 

  30. Koibuchi, M., Matsutani, H., Amano, H., Pinkston, T.M. (2008). A lightweight fault-tolerant mechanism for Network-on-Chip. In Second ACM/IEEE international symposium on Networks-on-Chip, 2008. NoCS 2008 (pp. 13–22).

  31. Castro, H.S., & de Lima, O.A. (2013). A fault tolerant NoC architecture based upon external router backup paths. In 2013 IEEE 11th international new circuits and systems conference (NEWCAS) (pp. 1–4): IEEE.

  32. Khalil, K., Eldash, O., Kumar, A., Bayoumi, M. (2018). Flexible self-healing router for reliable and high-performance Network-on-Chips architecture. In 2018 31st IEEE international system-on-chip conference (SOCC) (pp. 152–157).

  33. Yuan, C., Huang, L., Wang, J., Li, Q. (2018). Micro-architecture design for low overhead fault tolerant network-on-chip. In 2018 IEEE international symposium on circuits and systems (ISCAS) (pp. 1–5).

  34. DiTomaso, D., Kodi, A., Louri, A. (2014). QORE: a fault tolerant network-on-chip architecture with power-efficient quad-function channel (QFC) buffers. In 2014 IEEE 20th international symposium on high performance computer architecture (HPCA) (pp. 320–331): IEEE.

  35. Wang, L., Ma, S., Li, C., Chen, W., Wang, Z. (2017). A high performance reliable NoC router. Integration, 58, 583–592.

    Article  Google Scholar 

  36. Lei, Z., Yinhe, H., Qiang, X., Xiao-Wei, L., Huawei, L. (2009). On topology reconfiguration for defect-tolerant NoC-based homogeneous manycore systems. Very Large Scale Integration (VLSI) Systems, IEEE Transactions on, 17(9), 1173–1186.

    Article  Google Scholar 

  37. Werner, S., Navaridas, J., Luján, M. (2016). A survey on design approaches to circumvent permanent faults in networks-on-chip. ACM Computing Surveys (CSUR), 48(4), 59.

    Article  Google Scholar 

  38. Cota, É., Amory, A.d.M., Lubaszewski, M.S. (2011). Reliability, availability and serviceability of networks-on-chip. Berlin: Springer.

    MATH  Google Scholar 

  39. Ren, Y., Liu, L., Yin, S., Han, J., Wu, Q., Wei, S. (2013). A fault tolerant NoC architecture using quad-spare mesh topology and dynamic reconfiguration. Journal of Systems Architecture, 59(7), 482–491.

    Article  Google Scholar 

  40. Nishi, Y., & Doering, R. (2012). Handbook of semiconductor manufacturing technology. Boca Raton: CRC Press.

    Google Scholar 

  41. Chang, Y.-C., Huang, L.-R., Liu, H.-C., Yang, C.-J., Chiu, C.-T. (2014). Assessing automotive functional safety microprocessor with ISO 26262 hardware requirements. In Technical papers of 2014 international symposium on VLSI design, automation and test (pp. 1–4): IEEE.

  42. Lu, K.-L., Chen, Y.-Y., Huang, L.-R. (2018). FMEDA-based fault injection and data analysis in compliance with ISO-26262. In 2018 48th Annual IEEE/IFIP international conference on dependable systems and networks workshops (DSN-W) (pp. 275–278): IEEE.

  43. Shamshiri, S., & Kwang-Ting, C. (2009). Yield and cost analysis of a reliable NoC. In VLSI test symposium, 2009. VTS ’09. 27th IEEE (pp. 173–178).

  44. Carulli, J.M., & Anderson, T.J. (2006). The impact of multiple failure modes on estimating product field reliability. Design & Test of Computers, IEEE, 23(2), 118–126.

    Article  Google Scholar 

  45. Catania, V., Mineo, A., Monteleone, S., Palesi, M., Patti, D. (2015). Noxim: an open, extensible and cycle-accurate network on chip simulator. In 2015 IEEE 26th international conference on application-specific systems, architectures and processors (ASAP) (pp. 162–163).

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yung-Chang Chang.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chang, YC., Gong, CS.A. & Chiu, CT. Fault-Tolerant Mesh-Based NoC with Router-Level Redundancy. J Sign Process Syst 92, 345–355 (2020). https://doi.org/10.1007/s11265-019-01476-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11265-019-01476-3

Keywords

Navigation