skip to main content
research-article
Open Access

HeM3D: Heterogeneous Manycore Architecture Based on Monolithic 3D Vertical Integration

Published:17 February 2021Publication History
Skip Abstract Section

Abstract

Heterogeneous manycore architectures are the key to efficiently execute compute- and data-intensive applications. Through-silicon-via (TSV)-based 3D manycore system is a promising solution in this direction as it enables the integration of disparate computing cores on a single system. Recent industry trends show the viability of 3D integration in real products (e.g., Intel Lakefield SoC Architecture, the AMD Radeon R9 Fury X graphics card, and Xilinx Virtex-7 2000T/H580T, etc.). However, the achievable performance of conventional TSV-based 3D systems is ultimately bottlenecked by the horizontal wires (wires in each planar die). Moreover, current TSV 3D architectures suffer from thermal limitations. Hence, TSV-based architectures do not realize the full potential of 3D integration. Monolithic 3D (M3D) integration, a breakthrough technology to achieve “More Moore and More Than Moore,” opens up the possibility of designing cores and associated network routers using multiple layers by utilizing monolithic inter-tier vias (MIVs) and hence, reducing the effective wire length. Compared to TSV-based 3D integrated circuits (ICs), M3D offers the “true” benefits of vertical dimension for system integration: the size of an MIV used in M3D is over 100 × smaller than a TSV. This dramatic reduction in via size and the resulting increase in density opens up numerous opportunities for design optimizations in 3D manycore systems: designers can use up to millions of MIVs for ultra-fine-grained 3D optimization, where individual cores and routers can be spread across multiple tiers for extreme power and performance optimization. In this work, we demonstrate how M3D-enabled vertical core and uncore elements offer significant performance and thermal improvements in manycore heterogeneous architectures compared to its TSV-based counterpart. To overcome the difficult optimization challenges due to the large design space and complex interactions among the heterogeneous components (CPU, GPU, Last Level Cache, etc.) in a M3D-based manycore chip, we leverage novel design-space exploration algorithms to trade off different objectives. The proposed M3D-enabled heterogeneous architecture, called HeM3D, outperforms its state-of-the-art TSV-equivalent counterpart by up to 18.3% in execution time while being up to 19°C cooler.

References

  1. E. Danovaro, A. Clematis, A. Galizia, G. Ripepi, A. Quarati, and D. D'Agostino. 2014. Heterogeneous architectures for computational intensive applications: A cost-effectiveness analysis. Journal of Computational and Applied Mathematics 270 (2014), 63--77. DOI:https://doi.org/10.1016/j.cam.2014.02.022Google ScholarGoogle ScholarCross RefCross Ref
  2. M. Daga, A. M. Aji, and W. Feng. 2011. On the efficacy of a fused CPU+GPU processor (or APU) for parallel computing. In 2011 Symposium on Application Accelerators in High-Performance Computing, 141--149. DOI:https://doi.org/10.1109/SAAHPC.2011.29 Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. W. R. Davis et al. 2005. Demystifying 3D ICs: the pros and cons of going vertical. IEEE Design & Test of Computers 22 (2005), 498--510. DOI:https://doi.org/10.1109/MDT.2005.136 Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. S. K. Samal, D. Nayak, M. Ichihashi, S. Banna, and S. K. Lim. 2016. Monolithic 3D IC vs. TSV-based 3D IC in 14nm FinFET technology. In 2016 IEEE SOI-3D-Subthreshold Microelectronics Technology Unified Conference (S3S), 1--2. DOI:https://doi.org/10.1109/S3S.2016.7804405Google ScholarGoogle Scholar
  5. S. K. Samal, S. Panth, K. Samadi, M. Saedi, Y. Du, and S. K. Lim. 2014. Fast and accurate thermal modeling and optimization for monolithic 3D ICs. In 2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC’14), 1--6. DOI:https://doi.org/10.1145/2593069.2593140 Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. S. Panth, K. Samadi, Y. Du, and S. K. Lim. 2013. High-density integration of functional modules using monolithic 3D-IC technology. In 2013 18th Asia and South Pacific Design Automation Conference (ASP-DAC’13), 681--686. DOI:https://doi.org/10.1109/ASPDAC.2013.6509679Google ScholarGoogle ScholarCross RefCross Ref
  7. S. Das, J. R. Doppa, P. P. Pande, and K. Chakrabarty. 2017. Monolithic 3D-enabled high performance and energy efficient network-on-chip. In 2017 IEEE International Conference on Computer Design (ICCD’17), 233--240. DOI:https://doi.org/10.1109/ICCD.2017.43Google ScholarGoogle ScholarCross RefCross Ref
  8. J. Hestness, S. W. Keckler, and D. A. Wood. 2015. GPU computing pipeline inefficiencies and optimization opportunities in heterogeneous CPU-GPU processors. In 2015 IEEE International Symposium on Workload Characterization, 87--97. DOI:https://doi.org/10.1109/IISWC.2015.15 Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. B. Gopireddy and J. Torrellas. Designing vertical processors in monolithic 3D. In Proceedings of the 46th International Symposium on Computer Architecture (ISCA’19). Association for Computing Machinery, New York, NY. DOI:https://doi.org/10.1145/3307650.3322233 Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Y. Gong, J. Kong, and S. W. Chung. 2019. Quantifying the impact of monolithic 3D (M3D) integration on L1 caches. IEEE Transactions on Emerging Topics in Computing 1 (2019). DOI:https://doi.org/10.1109/TETC.2019.2894982Google ScholarGoogle Scholar
  11. A. Bakhoda, J. Kim, and T. M. Aamodt. 2010. Throughput-effective on-chip networks for manycore accelerators. In 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture, 421--432. DOI:https://doi.org/10.1109/MICRO.2010.50 Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. S. Che et al. 2009. Rodinia: A benchmark suite for heterogeneous computing. In 2009 IEEE International Symposium on Workload Characterization (IISWC’09), 44--54. DOI:https://doi.org/10.1109/IISWC.2009.5306797 Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Y. Lee and S. K. Lim. 2013. Ultrahigh density logic designs using monolithic 3-D integration. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 32 (2013), 1892--1905. DOI:https://doi.org/10.1109/TCAD.2013.2273986 Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. I. Hong and D. H. Kim. 2018. Analysis of performance benefits of multitier gate-level monolithic 3-D integrated circuits. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 37 (2018), 1614--1626. DOI:https://doi.org/10.1109/TCAD.2017.2768427Google ScholarGoogle ScholarCross RefCross Ref
  15. S. D. Lin and D. H. Kim. 2018. Detailed-placement-enabled dynamic power optimization of multitier gate-level monolithic 3-D ICs. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 37 (2018), 845--854. DOI:https://doi.org/10.1109/TCAD.2017.2729401 Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Y. Yu and N. K. Jha. 2018. Energy-efficient monolithic three-dimensional on-chip memory architectures. IEEE Transactions on Nanotechnology 17 (2018), 620--633. DOI:https://doi.org/10.1109/TNANO.2017.2731871Google ScholarGoogle ScholarCross RefCross Ref
  17. H. Jand, J. Kim, P. Gratz, K. H. Yum, and E. J. Kim. 2015. Bandwidth-efficient on-chip interconnect designs for GPGPUs. In 2015 52nd ACM/EDAC/IEEE Design Automation Conference (DAC’15), 1--6. DOI:https://doi.org/10.1145/2744769.2744803 Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. B. K. Joardar, R. G. Kim, J. R. Doppa, P. P. Pande, D. Marculescu, and R. Marculescu. 2019. Learning-based application-agnostic 3D NoC design for heterogeneous manycore systems. IEEE Transactions on Computers 68 (2019), 852--866. DOI:https://doi.org/10.1109/TC.2018.2889053 Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. J. Cong, J. Wei, and Y. Zhang. 2004. A thermal-driven floorplanning algorithm for 3D ICs. In IEEE/ACM International Conference on Computer Aided Design (ICCAD’04), 306--313. DOI:https://doi.org/10.1109/ICCAD.2004.1382591 Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. A. Sridhar, A. Vincenzi, M. Ruggiero, T. Brunschwiler, and D. Atienza. 2010. 3D-ICE: Fast compact transient thermal modeling for 3D ICs with inter-tier liquid cooling. In 2010 IEEE/ACM International Conference on Computer-Aided Design (ICCAD’10), 463--470. DOI:https://doi.org/10.1109/ICCAD.2010.5653749 Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Y. Xiao, S. Nazarian, and P. Bogdan. 2019. Self-optimizing and self-programming computing systems: A combined compiler, complex networks, and machine learning approach. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 27 (2019), 1416--1427. DOI:https://doi.org/10.1109/TVLSI.2019.2897650Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. W. Choi et al. 2018. On-chip communication network for efficient training of deep convolutional networks on heterogeneous manycore systems. IEEE Transactions on Computers 67 (2018), 672--686. DOI:https://doi.org/10.1109/TC.2017.2777863Google ScholarGoogle ScholarCross RefCross Ref
  23. J. Shi et al. 2016. A 14 nm FinFET transistor-level 3D partitioning design to enable high-performance and low-cost monolithic 3D IC. In 2016 IEEE International Electron Devices Meeting (IEDM’16), 2.5.1–2.5.4. DOI:https://doi.org/10.1109/IEDM.2016.7838032Google ScholarGoogle ScholarCross RefCross Ref
  24. C. Liu and S. K. Lim. 2012. A design tradeoff study with monolithic 3D integration. In 13th International Symposium on Quality Electronic Design (ISQED’12), 529--536. DOI:https://doi.org/10.1109/ISQED.2012.6187545Google ScholarGoogle ScholarCross RefCross Ref
  25. S. Panth, K. Samadi, Y. Du, and S. K. Lim. 2014. Power-performance study of block-level monolithic 3D-ICs considering inter-tier performance variations. In 2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC’14), 1--6. DOI:https://doi.org/10.1145/2593069.2593188 Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. R. Balasubramanian et al. 2015. MIAOW—An open source RTL implementation of a GPGPU. In 2015 IEEE Symposium in Low-Power and High-Speed Chips (COOL CHIPS XVIII), 1--3. DOI:https://doi.org/10.1109/CoolChips.2015.7158663Google ScholarGoogle ScholarCross RefCross Ref
  27. J. Power, J. Hestness, M. S. Orr, M. D. Hill, and D. A. Wood. 2015. gem5-gpu: A heterogeneous CPU-GPU Simulator. IEEE Computer Architecture Letters 14 (2015), 34--36. DOI:https://doi.org/10.1109/LCA.2014.2299539Google ScholarGoogle ScholarCross RefCross Ref
  28. M. Zapater, J. L. Ayala, J. M. Moya, K. Vaidyanathan, K. Gross, and A. K. Coskun. 2013. Leakage and temperature aware server control for improving energy efficiency in data centers. In 2013 Design, Automation & Test in Europe Conference & Exhibition (DATE’13), 266--269. DOI:https://doi.org/10.7873/DATE.2013.067 Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. S. Bandyopadhyay, S. Saha, U. Maulik, and K. Deb. 2008. A simulated annealing-based multiobjective optimization algorithm: AMOSA. IEEE Transactions on Evolutionary Computation 12 (2008), 269--283. DOI:https://doi.org/10.1109/TEVC.2007.900837 Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. M. Lukasiewycz, M. Glass, C. Haubelt, and J. Teich. 2007. SAT-decoding in evolutionary algorithms for discrete constrained optimization problems. In 2007 IEEE Congress on Evolutionary Computation, 935--942. DOI:https://doi.org/10.1109/CEC.2007.4424570Google ScholarGoogle ScholarCross RefCross Ref
  31. A. Deshwal, N. K. Jayakodi, B. K. Joardar, J. R. Doppa, and P. P. Pande. 2019. MOOS: A multi-objective design space exploration and optimization framework for NoC enabled manycore systems. ACM Transactions on Embedded Computing Systems (TECS) 18, 5s (2019), Article 77, 23 pages. DOI:https://doi.org/10.1145/3358206 Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. F. Smirnov, B. Pourmohseni, M. Glaß, and J. Teich. 2019. IGOR, get me the optimum! Prioritizing important design decisions during the DSE of embedded systems. ACM Transactions on Embedded Computing Systems (TECS) 18, 5s (2019), Article 78, 22 pages. DOI:https://doi.org/10.1145/3358204 Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. N. Agarwal, T. Krishna, L. Peh, and N. K. Jha. 2009. GARNET: A detailed on-chip network model inside a full-system simulator. In 2009 IEEE International Symposium on Performance Analysis of Systems and Software, 33--42. DOI:https://doi.org/10.1109/ISPASS.2009.4919636Google ScholarGoogle ScholarCross RefCross Ref
  34. J. Leng, T. Hetherington, A. El Tantawy, S. Gilani, N. S. Kim, T. M. Aamodt, and V. J. Reddi. 2013. GPUWattch: Enabling energy optimizations in GPGPUs. In Proceedings of the 40th Annual International Symposium on Computer Architecture (ISCA’13). Association for Computing Machinery, New York, NY, 487--498. DOI:https://doi.org/10.1145/2485922.2485964 Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. S. Li, J. H. Ahn, R. D. Strong, J. B. Brockman, D. M. Tullsen, and N. P. Jouppi. 2009. McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures. In 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’09), 469--480. DOI:https://doi.org/10.1145/1669112.1669172 Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. P. Batude et al. 2012. 3-D sequential integration: A key enabling technology for heterogeneous co-integration of new function with CMOS. IEEE Journal on Emerging and Selected Topics in Circuits and Systems 2 (2012), 714--722. DOI:https://doi.org/10.1109/JETCAS.2012.2223593Google ScholarGoogle ScholarCross RefCross Ref
  37. B. Rajendran et al. 2007. Low thermal budget processing for sequential 3-D IC fabrication. IEEE Transactions on Electron Devices 54 (2007), 707--714. DOI:https://doi.org/10.1109/TED.2007.891300Google ScholarGoogle ScholarCross RefCross Ref
  38. D. Lee, S. Das, J. R. Doppa, P. P. Pande, and K. Chakrabarty. 2019. Impact of electrostatic coupling on monolithic 3D-enabled network on chip. ACM Transactions on Design Automation of Electronic Systems 24, 6 (2019), Article 62, 22 pages. DOI:https://doi.org/10.1145/3357158 Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. HeM3D: Heterogeneous Manycore Architecture Based on Monolithic 3D Vertical Integration

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Design Automation of Electronic Systems
      ACM Transactions on Design Automation of Electronic Systems  Volume 26, Issue 2
      March 2021
      220 pages
      ISSN:1084-4309
      EISSN:1557-7309
      DOI:10.1145/3430836
      Issue’s Table of Contents

      Copyright © 2021 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 17 February 2021
      • Accepted: 1 September 2020
      • Revised: 1 August 2020
      • Received: 1 June 2020
      Published in todaes Volume 26, Issue 2

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format