HeM3D: Heterogeneous Manycore Architecture Based on Monolithic 3D Vertical Integration

Authors:
Aqeeb Iqbal Arka

Washington State University, Pullman, WA

Washington State University, Pullman, WA
View Profile

,
Biresh Kumar Joardar

Washington State University, Pullman, WA

Washington State University, Pullman, WA
View Profile

,
Ryan Gary Kim

Colorado State University, Fort Collins, CO

Colorado State University, Fort Collins, CO
View Profile

,
Dae Hyun Kim

Washington State University, Pullman, WA

Washington State University, Pullman, WA
View Profile

,
Janardhan Rao Doppa

Washington State University, Pullman, WA

Washington State University, Pullman, WA
View Profile

,
Partha Pratim Pande

Washington State University, Pullman, WA

Washington State University, Pullman, WA
View Profile

ACM Transactions on Design Automation of Electronic Systems Volume 26 Issue 2Article No.: 16pp 1–21https://doi.org/10.1145/3424239

Published:17 February 2021Publication History

ACM Transactions on Design Automation of Electronic Systems

Abstract

Heterogeneous manycore architectures are the key to efficiently execute compute- and data-intensive applications. Through-silicon-via (TSV)-based 3D manycore system is a promising solution in this direction as it enables the integration of disparate computing cores on a single system. Recent industry trends show the viability of 3D integration in real products (e.g., Intel Lakefield SoC Architecture, the AMD Radeon R9 Fury X graphics card, and Xilinx Virtex-7 2000T/H580T, etc.). However, the achievable performance of conventional TSV-based 3D systems is ultimately bottlenecked by the horizontal wires (wires in each planar die). Moreover, current TSV 3D architectures suffer from thermal limitations. Hence, TSV-based architectures do not realize the full potential of 3D integration. Monolithic 3D (M3D) integration, a breakthrough technology to achieve “More Moore and More Than Moore,” opens up the possibility of designing cores and associated network routers using multiple layers by utilizing monolithic inter-tier vias (MIVs) and hence, reducing the effective wire length. Compared to TSV-based 3D integrated circuits (ICs), M3D offers the “true” benefits of vertical dimension for system integration: the size of an MIV used in M3D is over 100 × smaller than a TSV. This dramatic reduction in via size and the resulting increase in density opens up numerous opportunities for design optimizations in 3D manycore systems: designers can use up to millions of MIVs for ultra-fine-grained 3D optimization, where individual cores and routers can be spread across multiple tiers for extreme power and performance optimization. In this work, we demonstrate how M3D-enabled vertical core and uncore elements offer significant performance and thermal improvements in manycore heterogeneous architectures compared to its TSV-based counterpart. To overcome the difficult optimization challenges due to the large design space and complex interactions among the heterogeneous components (CPU, GPU, Last Level Cache, etc.) in a M3D-based manycore chip, we leverage novel design-space exploration algorithms to trade off different objectives. The proposed M3D-enabled heterogeneous architecture, called HeM3D, outperforms its state-of-the-art TSV-equivalent counterpart by up to 18.3% in execution time while being up to 19°C cooler.

References

E. Danovaro, A. Clematis, A. Galizia, G. Ripepi, A. Quarati, and D. D'Agostino. 2014. Heterogeneous architectures for computational intensive applications: A cost-effectiveness analysis. Journal of Computational and Applied Mathematics 270 (2014), 63--77. DOI:https://doi.org/10.1016/j.cam.2014.02.022Google ScholarCross Ref
M. Daga, A. M. Aji, and W. Feng. 2011. On the efficacy of a fused CPU+GPU processor (or APU) for parallel computing. In 2011 Symposium on Application Accelerators in High-Performance Computing, 141--149. DOI:https://doi.org/10.1109/SAAHPC.2011.29 Google ScholarDigital Library
W. R. Davis et al. 2005. Demystifying 3D ICs: the pros and cons of going vertical. IEEE Design & Test of Computers 22 (2005), 498--510. DOI:https://doi.org/10.1109/MDT.2005.136 Google ScholarDigital Library
S. K. Samal, D. Nayak, M. Ichihashi, S. Banna, and S. K. Lim. 2016. Monolithic 3D IC vs. TSV-based 3D IC in 14nm FinFET technology. In 2016 IEEE SOI-3D-Subthreshold Microelectronics Technology Unified Conference (S3S), 1--2. DOI:https://doi.org/10.1109/S3S.2016.7804405Google Scholar
S. K. Samal, S. Panth, K. Samadi, M. Saedi, Y. Du, and S. K. Lim. 2014. Fast and accurate thermal modeling and optimization for monolithic 3D ICs. In 2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC’14), 1--6. DOI:https://doi.org/10.1145/2593069.2593140 Google ScholarDigital Library
S. Panth, K. Samadi, Y. Du, and S. K. Lim. 2013. High-density integration of functional modules using monolithic 3D-IC technology. In 2013 18th Asia and South Pacific Design Automation Conference (ASP-DAC’13), 681--686. DOI:https://doi.org/10.1109/ASPDAC.2013.6509679Google ScholarCross Ref
S. Das, J. R. Doppa, P. P. Pande, and K. Chakrabarty. 2017. Monolithic 3D-enabled high performance and energy efficient network-on-chip. In 2017 IEEE International Conference on Computer Design (ICCD’17), 233--240. DOI:https://doi.org/10.1109/ICCD.2017.43Google ScholarCross Ref
J. Hestness, S. W. Keckler, and D. A. Wood. 2015. GPU computing pipeline inefficiencies and optimization opportunities in heterogeneous CPU-GPU processors. In 2015 IEEE International Symposium on Workload Characterization, 87--97. DOI:https://doi.org/10.1109/IISWC.2015.15 Google ScholarDigital Library
B. Gopireddy and J. Torrellas. Designing vertical processors in monolithic 3D. In Proceedings of the 46th International Symposium on Computer Architecture (ISCA’19). Association for Computing Machinery, New York, NY. DOI:https://doi.org/10.1145/3307650.3322233 Google ScholarDigital Library
Y. Gong, J. Kong, and S. W. Chung. 2019. Quantifying the impact of monolithic 3D (M3D) integration on L1 caches. IEEE Transactions on Emerging Topics in Computing 1 (2019). DOI:https://doi.org/10.1109/TETC.2019.2894982Google Scholar
A. Bakhoda, J. Kim, and T. M. Aamodt. 2010. Throughput-effective on-chip networks for manycore accelerators. In 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture, 421--432. DOI:https://doi.org/10.1109/MICRO.2010.50 Google ScholarDigital Library
S. Che et al. 2009. Rodinia: A benchmark suite for heterogeneous computing. In 2009 IEEE International Symposium on Workload Characterization (IISWC’09), 44--54. DOI:https://doi.org/10.1109/IISWC.2009.5306797 Google ScholarDigital Library
Y. Lee and S. K. Lim. 2013. Ultrahigh density logic designs using monolithic 3-D integration. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 32 (2013), 1892--1905. DOI:https://doi.org/10.1109/TCAD.2013.2273986 Google ScholarDigital Library
I. Hong and D. H. Kim. 2018. Analysis of performance benefits of multitier gate-level monolithic 3-D integrated circuits. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 37 (2018), 1614--1626. DOI:https://doi.org/10.1109/TCAD.2017.2768427Google ScholarCross Ref
S. D. Lin and D. H. Kim. 2018. Detailed-placement-enabled dynamic power optimization of multitier gate-level monolithic 3-D ICs. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 37 (2018), 845--854. DOI:https://doi.org/10.1109/TCAD.2017.2729401 Google ScholarDigital Library
Y. Yu and N. K. Jha. 2018. Energy-efficient monolithic three-dimensional on-chip memory architectures. IEEE Transactions on Nanotechnology 17 (2018), 620--633. DOI:https://doi.org/10.1109/TNANO.2017.2731871Google ScholarCross Ref
H. Jand, J. Kim, P. Gratz, K. H. Yum, and E. J. Kim. 2015. Bandwidth-efficient on-chip interconnect designs for GPGPUs. In 2015 52nd ACM/EDAC/IEEE Design Automation Conference (DAC’15), 1--6. DOI:https://doi.org/10.1145/2744769.2744803 Google ScholarDigital Library
B. K. Joardar, R. G. Kim, J. R. Doppa, P. P. Pande, D. Marculescu, and R. Marculescu. 2019. Learning-based application-agnostic 3D NoC design for heterogeneous manycore systems. IEEE Transactions on Computers 68 (2019), 852--866. DOI:https://doi.org/10.1109/TC.2018.2889053 Google ScholarDigital Library
J. Cong, J. Wei, and Y. Zhang. 2004. A thermal-driven floorplanning algorithm for 3D ICs. In IEEE/ACM International Conference on Computer Aided Design (ICCAD’04), 306--313. DOI:https://doi.org/10.1109/ICCAD.2004.1382591 Google ScholarDigital Library
A. Sridhar, A. Vincenzi, M. Ruggiero, T. Brunschwiler, and D. Atienza. 2010. 3D-ICE: Fast compact transient thermal modeling for 3D ICs with inter-tier liquid cooling. In 2010 IEEE/ACM International Conference on Computer-Aided Design (ICCAD’10), 463--470. DOI:https://doi.org/10.1109/ICCAD.2010.5653749 Google ScholarDigital Library
Y. Xiao, S. Nazarian, and P. Bogdan. 2019. Self-optimizing and self-programming computing systems: A combined compiler, complex networks, and machine learning approach. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 27 (2019), 1416--1427. DOI:https://doi.org/10.1109/TVLSI.2019.2897650Google ScholarDigital Library
W. Choi et al. 2018. On-chip communication network for efficient training of deep convolutional networks on heterogeneous manycore systems. IEEE Transactions on Computers 67 (2018), 672--686. DOI:https://doi.org/10.1109/TC.2017.2777863Google ScholarCross Ref
J. Shi et al. 2016. A 14 nm FinFET transistor-level 3D partitioning design to enable high-performance and low-cost monolithic 3D IC. In 2016 IEEE International Electron Devices Meeting (IEDM’16), 2.5.1–2.5.4. DOI:https://doi.org/10.1109/IEDM.2016.7838032Google ScholarCross Ref
C. Liu and S. K. Lim. 2012. A design tradeoff study with monolithic 3D integration. In 13th International Symposium on Quality Electronic Design (ISQED’12), 529--536. DOI:https://doi.org/10.1109/ISQED.2012.6187545Google ScholarCross Ref
S. Panth, K. Samadi, Y. Du, and S. K. Lim. 2014. Power-performance study of block-level monolithic 3D-ICs considering inter-tier performance variations. In 2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC’14), 1--6. DOI:https://doi.org/10.1145/2593069.2593188 Google ScholarDigital Library
R. Balasubramanian et al. 2015. MIAOW—An open source RTL implementation of a GPGPU. In 2015 IEEE Symposium in Low-Power and High-Speed Chips (COOL CHIPS XVIII), 1--3. DOI:https://doi.org/10.1109/CoolChips.2015.7158663Google ScholarCross Ref
J. Power, J. Hestness, M. S. Orr, M. D. Hill, and D. A. Wood. 2015. gem5-gpu: A heterogeneous CPU-GPU Simulator. IEEE Computer Architecture Letters 14 (2015), 34--36. DOI:https://doi.org/10.1109/LCA.2014.2299539Google ScholarCross Ref
M. Zapater, J. L. Ayala, J. M. Moya, K. Vaidyanathan, K. Gross, and A. K. Coskun. 2013. Leakage and temperature aware server control for improving energy efficiency in data centers. In 2013 Design, Automation & Test in Europe Conference & Exhibition (DATE’13), 266--269. DOI:https://doi.org/10.7873/DATE.2013.067 Google ScholarDigital Library
S. Bandyopadhyay, S. Saha, U. Maulik, and K. Deb. 2008. A simulated annealing-based multiobjective optimization algorithm: AMOSA. IEEE Transactions on Evolutionary Computation 12 (2008), 269--283. DOI:https://doi.org/10.1109/TEVC.2007.900837 Google ScholarDigital Library
M. Lukasiewycz, M. Glass, C. Haubelt, and J. Teich. 2007. SAT-decoding in evolutionary algorithms for discrete constrained optimization problems. In 2007 IEEE Congress on Evolutionary Computation, 935--942. DOI:https://doi.org/10.1109/CEC.2007.4424570Google ScholarCross Ref
A. Deshwal, N. K. Jayakodi, B. K. Joardar, J. R. Doppa, and P. P. Pande. 2019. MOOS: A multi-objective design space exploration and optimization framework for NoC enabled manycore systems. ACM Transactions on Embedded Computing Systems (TECS) 18, 5s (2019), Article 77, 23 pages. DOI:https://doi.org/10.1145/3358206 Google ScholarDigital Library
F. Smirnov, B. Pourmohseni, M. Glaß, and J. Teich. 2019. IGOR, get me the optimum! Prioritizing important design decisions during the DSE of embedded systems. ACM Transactions on Embedded Computing Systems (TECS) 18, 5s (2019), Article 78, 22 pages. DOI:https://doi.org/10.1145/3358204 Google ScholarDigital Library
N. Agarwal, T. Krishna, L. Peh, and N. K. Jha. 2009. GARNET: A detailed on-chip network model inside a full-system simulator. In 2009 IEEE International Symposium on Performance Analysis of Systems and Software, 33--42. DOI:https://doi.org/10.1109/ISPASS.2009.4919636Google ScholarCross Ref
J. Leng, T. Hetherington, A. El Tantawy, S. Gilani, N. S. Kim, T. M. Aamodt, and V. J. Reddi. 2013. GPUWattch: Enabling energy optimizations in GPGPUs. In Proceedings of the 40th Annual International Symposium on Computer Architecture (ISCA’13). Association for Computing Machinery, New York, NY, 487--498. DOI:https://doi.org/10.1145/2485922.2485964 Google ScholarDigital Library
S. Li, J. H. Ahn, R. D. Strong, J. B. Brockman, D. M. Tullsen, and N. P. Jouppi. 2009. McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures. In 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’09), 469--480. DOI:https://doi.org/10.1145/1669112.1669172 Google ScholarDigital Library
P. Batude et al. 2012. 3-D sequential integration: A key enabling technology for heterogeneous co-integration of new function with CMOS. IEEE Journal on Emerging and Selected Topics in Circuits and Systems 2 (2012), 714--722. DOI:https://doi.org/10.1109/JETCAS.2012.2223593Google ScholarCross Ref
B. Rajendran et al. 2007. Low thermal budget processing for sequential 3-D IC fabrication. IEEE Transactions on Electron Devices 54 (2007), 707--714. DOI:https://doi.org/10.1109/TED.2007.891300Google ScholarCross Ref
D. Lee, S. Das, J. R. Doppa, P. P. Pande, and K. Chakrabarty. 2019. Impact of electrostatic coupling on monolithic 3D-enabled network on chip. ACM Transactions on Design Automation of Electronic Systems 24, 6 (2019), Article 62, 22 pages. DOI:https://doi.org/10.1145/3357158 Google ScholarDigital Library

Index Terms

HeM3D: Heterogeneous Manycore Architecture Based on Monolithic 3D Vertical Integration
1. Computer systems organization
  1. Architectures
    1. Other architectures
      1. Heterogeneous (hybrid) systems

Recommendations

Channel hot-carrier degradation in pMOS and nMOS short channel transistors with high-k dielectric stack

A comparison between pMOS and nMOS short channel transistors with high-k dielectric subjected to channel hot-carrier (CHC) stress is presented. Smaller CHC degradation is observed in pMOS devices. At high temperature, the CHC degradation increases for ...
Read More
Investigation of determinant factors of minimum operating voltage of logic gates in 65-nm CMOS
ISLPED '11: Proceedings of the 17th IEEE/ACM international symposium on Low-power electronics and design

Determinant factors of the minimum operating voltage (V_DDmin) of CMOS logic gates are investigated by measurements of logic-gate chains in 65nm CMOS. VDDmin consists of a systematic component (V_DDmin(SYS)) and a random variation component (V_DDmin(RAND))...
Read More
Effect of lateral straggle parameter on Hetero Junction Dual Gate Vertical TFET
Abstract
In this Article, the effects of lateral straggle parameter variation and Temperature variation have been investigated on Hetero Junction Dual Gate Vertical TFET. Although the TFET is a viable alternative to the MOSFET, the performance ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Design Automation of Electronic Systems Volume 26, Issue 2
March 2021
220 pages
ISSN:1084-4309
EISSN:1557-7309
DOI:10.1145/3430836
Editor:
X. Sharon Hu
University of Notre Dame, USA
Issue’s Table of Contents
Copyright © 2021 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States

Journal Family
ACM Journals for the Design of Smart and Connected Systems
Publication History
- Published: 17 February 2021
- Accepted: 1 September 2020
- Revised: 1 August 2020
- Received: 1 June 2020
Published in todaes Volume 26, Issue 2

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Heterogeneous manycore
M3D
NoC
execution time
multi-tier
performance
temperature
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 6
  Total Citations
  View Citations
- 612
  Total Downloads
- Downloads (Last 12 months)132
- Downloads (Last 6 weeks)11
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

HeM3D: Heterogeneous Manycore Architecture Based on Monolithic 3D Vertical Integration

ACM Transactions on Design Automation of Electronic Systems

Abstract

References

Cited By

Index Terms

Recommendations

Channel hot-carrier degradation in pMOS and nMOS short channel transistors with high-k dielectric stack

Investigation of determinant factors of minimum operating voltage of logic gates in 65-nm CMOS

Effect of lateral straggle parameter on Hetero Junction Dual Gate Vertical TFET