Abstract
Heterogeneous manycore architectures are the key to efficiently execute compute- and data-intensive applications. Through-silicon-via (TSV)-based 3D manycore system is a promising solution in this direction as it enables the integration of disparate computing cores on a single system. Recent industry trends show the viability of 3D integration in real products (e.g., Intel Lakefield SoC Architecture, the AMD Radeon R9 Fury X graphics card, and Xilinx Virtex-7 2000T/H580T, etc.). However, the achievable performance of conventional TSV-based 3D systems is ultimately bottlenecked by the horizontal wires (wires in each planar die). Moreover, current TSV 3D architectures suffer from thermal limitations. Hence, TSV-based architectures do not realize the full potential of 3D integration. Monolithic 3D (M3D) integration, a breakthrough technology to achieve “More Moore and More Than Moore,” opens up the possibility of designing cores and associated network routers using multiple layers by utilizing monolithic inter-tier vias (MIVs) and hence, reducing the effective wire length. Compared to TSV-based 3D integrated circuits (ICs), M3D offers the “true” benefits of vertical dimension for system integration: the size of an MIV used in M3D is over 100 × smaller than a TSV. This dramatic reduction in via size and the resulting increase in density opens up numerous opportunities for design optimizations in 3D manycore systems: designers can use up to millions of MIVs for ultra-fine-grained 3D optimization, where individual cores and routers can be spread across multiple tiers for extreme power and performance optimization. In this work, we demonstrate how M3D-enabled vertical core and uncore elements offer significant performance and thermal improvements in manycore heterogeneous architectures compared to its TSV-based counterpart. To overcome the difficult optimization challenges due to the large design space and complex interactions among the heterogeneous components (CPU, GPU, Last Level Cache, etc.) in a M3D-based manycore chip, we leverage novel design-space exploration algorithms to trade off different objectives. The proposed M3D-enabled heterogeneous architecture, called HeM3D, outperforms its state-of-the-art TSV-equivalent counterpart by up to 18.3% in execution time while being up to 19°C cooler.
- E. Danovaro, A. Clematis, A. Galizia, G. Ripepi, A. Quarati, and D. D'Agostino. 2014. Heterogeneous architectures for computational intensive applications: A cost-effectiveness analysis. Journal of Computational and Applied Mathematics 270 (2014), 63--77. DOI:https://doi.org/10.1016/j.cam.2014.02.022Google ScholarCross Ref
- M. Daga, A. M. Aji, and W. Feng. 2011. On the efficacy of a fused CPU+GPU processor (or APU) for parallel computing. In 2011 Symposium on Application Accelerators in High-Performance Computing, 141--149. DOI:https://doi.org/10.1109/SAAHPC.2011.29 Google ScholarDigital Library
- W. R. Davis et al. 2005. Demystifying 3D ICs: the pros and cons of going vertical. IEEE Design & Test of Computers 22 (2005), 498--510. DOI:https://doi.org/10.1109/MDT.2005.136 Google ScholarDigital Library
- S. K. Samal, D. Nayak, M. Ichihashi, S. Banna, and S. K. Lim. 2016. Monolithic 3D IC vs. TSV-based 3D IC in 14nm FinFET technology. In 2016 IEEE SOI-3D-Subthreshold Microelectronics Technology Unified Conference (S3S), 1--2. DOI:https://doi.org/10.1109/S3S.2016.7804405Google Scholar
- S. K. Samal, S. Panth, K. Samadi, M. Saedi, Y. Du, and S. K. Lim. 2014. Fast and accurate thermal modeling and optimization for monolithic 3D ICs. In 2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC’14), 1--6. DOI:https://doi.org/10.1145/2593069.2593140 Google ScholarDigital Library
- S. Panth, K. Samadi, Y. Du, and S. K. Lim. 2013. High-density integration of functional modules using monolithic 3D-IC technology. In 2013 18th Asia and South Pacific Design Automation Conference (ASP-DAC’13), 681--686. DOI:https://doi.org/10.1109/ASPDAC.2013.6509679Google ScholarCross Ref
- S. Das, J. R. Doppa, P. P. Pande, and K. Chakrabarty. 2017. Monolithic 3D-enabled high performance and energy efficient network-on-chip. In 2017 IEEE International Conference on Computer Design (ICCD’17), 233--240. DOI:https://doi.org/10.1109/ICCD.2017.43Google ScholarCross Ref
- J. Hestness, S. W. Keckler, and D. A. Wood. 2015. GPU computing pipeline inefficiencies and optimization opportunities in heterogeneous CPU-GPU processors. In 2015 IEEE International Symposium on Workload Characterization, 87--97. DOI:https://doi.org/10.1109/IISWC.2015.15 Google ScholarDigital Library
- B. Gopireddy and J. Torrellas. Designing vertical processors in monolithic 3D. In Proceedings of the 46th International Symposium on Computer Architecture (ISCA’19). Association for Computing Machinery, New York, NY. DOI:https://doi.org/10.1145/3307650.3322233 Google ScholarDigital Library
- Y. Gong, J. Kong, and S. W. Chung. 2019. Quantifying the impact of monolithic 3D (M3D) integration on L1 caches. IEEE Transactions on Emerging Topics in Computing 1 (2019). DOI:https://doi.org/10.1109/TETC.2019.2894982Google Scholar
- A. Bakhoda, J. Kim, and T. M. Aamodt. 2010. Throughput-effective on-chip networks for manycore accelerators. In 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture, 421--432. DOI:https://doi.org/10.1109/MICRO.2010.50 Google ScholarDigital Library
- S. Che et al. 2009. Rodinia: A benchmark suite for heterogeneous computing. In 2009 IEEE International Symposium on Workload Characterization (IISWC’09), 44--54. DOI:https://doi.org/10.1109/IISWC.2009.5306797 Google ScholarDigital Library
- Y. Lee and S. K. Lim. 2013. Ultrahigh density logic designs using monolithic 3-D integration. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 32 (2013), 1892--1905. DOI:https://doi.org/10.1109/TCAD.2013.2273986 Google ScholarDigital Library
- I. Hong and D. H. Kim. 2018. Analysis of performance benefits of multitier gate-level monolithic 3-D integrated circuits. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 37 (2018), 1614--1626. DOI:https://doi.org/10.1109/TCAD.2017.2768427Google ScholarCross Ref
- S. D. Lin and D. H. Kim. 2018. Detailed-placement-enabled dynamic power optimization of multitier gate-level monolithic 3-D ICs. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 37 (2018), 845--854. DOI:https://doi.org/10.1109/TCAD.2017.2729401 Google ScholarDigital Library
- Y. Yu and N. K. Jha. 2018. Energy-efficient monolithic three-dimensional on-chip memory architectures. IEEE Transactions on Nanotechnology 17 (2018), 620--633. DOI:https://doi.org/10.1109/TNANO.2017.2731871Google ScholarCross Ref
- H. Jand, J. Kim, P. Gratz, K. H. Yum, and E. J. Kim. 2015. Bandwidth-efficient on-chip interconnect designs for GPGPUs. In 2015 52nd ACM/EDAC/IEEE Design Automation Conference (DAC’15), 1--6. DOI:https://doi.org/10.1145/2744769.2744803 Google ScholarDigital Library
- B. K. Joardar, R. G. Kim, J. R. Doppa, P. P. Pande, D. Marculescu, and R. Marculescu. 2019. Learning-based application-agnostic 3D NoC design for heterogeneous manycore systems. IEEE Transactions on Computers 68 (2019), 852--866. DOI:https://doi.org/10.1109/TC.2018.2889053 Google ScholarDigital Library
- J. Cong, J. Wei, and Y. Zhang. 2004. A thermal-driven floorplanning algorithm for 3D ICs. In IEEE/ACM International Conference on Computer Aided Design (ICCAD’04), 306--313. DOI:https://doi.org/10.1109/ICCAD.2004.1382591 Google ScholarDigital Library
- A. Sridhar, A. Vincenzi, M. Ruggiero, T. Brunschwiler, and D. Atienza. 2010. 3D-ICE: Fast compact transient thermal modeling for 3D ICs with inter-tier liquid cooling. In 2010 IEEE/ACM International Conference on Computer-Aided Design (ICCAD’10), 463--470. DOI:https://doi.org/10.1109/ICCAD.2010.5653749 Google ScholarDigital Library
- Y. Xiao, S. Nazarian, and P. Bogdan. 2019. Self-optimizing and self-programming computing systems: A combined compiler, complex networks, and machine learning approach. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 27 (2019), 1416--1427. DOI:https://doi.org/10.1109/TVLSI.2019.2897650Google ScholarDigital Library
- W. Choi et al. 2018. On-chip communication network for efficient training of deep convolutional networks on heterogeneous manycore systems. IEEE Transactions on Computers 67 (2018), 672--686. DOI:https://doi.org/10.1109/TC.2017.2777863Google ScholarCross Ref
- J. Shi et al. 2016. A 14 nm FinFET transistor-level 3D partitioning design to enable high-performance and low-cost monolithic 3D IC. In 2016 IEEE International Electron Devices Meeting (IEDM’16), 2.5.1–2.5.4. DOI:https://doi.org/10.1109/IEDM.2016.7838032Google ScholarCross Ref
- C. Liu and S. K. Lim. 2012. A design tradeoff study with monolithic 3D integration. In 13th International Symposium on Quality Electronic Design (ISQED’12), 529--536. DOI:https://doi.org/10.1109/ISQED.2012.6187545Google ScholarCross Ref
- S. Panth, K. Samadi, Y. Du, and S. K. Lim. 2014. Power-performance study of block-level monolithic 3D-ICs considering inter-tier performance variations. In 2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC’14), 1--6. DOI:https://doi.org/10.1145/2593069.2593188 Google ScholarDigital Library
- R. Balasubramanian et al. 2015. MIAOW—An open source RTL implementation of a GPGPU. In 2015 IEEE Symposium in Low-Power and High-Speed Chips (COOL CHIPS XVIII), 1--3. DOI:https://doi.org/10.1109/CoolChips.2015.7158663Google ScholarCross Ref
- J. Power, J. Hestness, M. S. Orr, M. D. Hill, and D. A. Wood. 2015. gem5-gpu: A heterogeneous CPU-GPU Simulator. IEEE Computer Architecture Letters 14 (2015), 34--36. DOI:https://doi.org/10.1109/LCA.2014.2299539Google ScholarCross Ref
- M. Zapater, J. L. Ayala, J. M. Moya, K. Vaidyanathan, K. Gross, and A. K. Coskun. 2013. Leakage and temperature aware server control for improving energy efficiency in data centers. In 2013 Design, Automation & Test in Europe Conference & Exhibition (DATE’13), 266--269. DOI:https://doi.org/10.7873/DATE.2013.067 Google ScholarDigital Library
- S. Bandyopadhyay, S. Saha, U. Maulik, and K. Deb. 2008. A simulated annealing-based multiobjective optimization algorithm: AMOSA. IEEE Transactions on Evolutionary Computation 12 (2008), 269--283. DOI:https://doi.org/10.1109/TEVC.2007.900837 Google ScholarDigital Library
- M. Lukasiewycz, M. Glass, C. Haubelt, and J. Teich. 2007. SAT-decoding in evolutionary algorithms for discrete constrained optimization problems. In 2007 IEEE Congress on Evolutionary Computation, 935--942. DOI:https://doi.org/10.1109/CEC.2007.4424570Google ScholarCross Ref
- A. Deshwal, N. K. Jayakodi, B. K. Joardar, J. R. Doppa, and P. P. Pande. 2019. MOOS: A multi-objective design space exploration and optimization framework for NoC enabled manycore systems. ACM Transactions on Embedded Computing Systems (TECS) 18, 5s (2019), Article 77, 23 pages. DOI:https://doi.org/10.1145/3358206 Google ScholarDigital Library
- F. Smirnov, B. Pourmohseni, M. Glaß, and J. Teich. 2019. IGOR, get me the optimum! Prioritizing important design decisions during the DSE of embedded systems. ACM Transactions on Embedded Computing Systems (TECS) 18, 5s (2019), Article 78, 22 pages. DOI:https://doi.org/10.1145/3358204 Google ScholarDigital Library
- N. Agarwal, T. Krishna, L. Peh, and N. K. Jha. 2009. GARNET: A detailed on-chip network model inside a full-system simulator. In 2009 IEEE International Symposium on Performance Analysis of Systems and Software, 33--42. DOI:https://doi.org/10.1109/ISPASS.2009.4919636Google ScholarCross Ref
- J. Leng, T. Hetherington, A. El Tantawy, S. Gilani, N. S. Kim, T. M. Aamodt, and V. J. Reddi. 2013. GPUWattch: Enabling energy optimizations in GPGPUs. In Proceedings of the 40th Annual International Symposium on Computer Architecture (ISCA’13). Association for Computing Machinery, New York, NY, 487--498. DOI:https://doi.org/10.1145/2485922.2485964 Google ScholarDigital Library
- S. Li, J. H. Ahn, R. D. Strong, J. B. Brockman, D. M. Tullsen, and N. P. Jouppi. 2009. McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures. In 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’09), 469--480. DOI:https://doi.org/10.1145/1669112.1669172 Google ScholarDigital Library
- P. Batude et al. 2012. 3-D sequential integration: A key enabling technology for heterogeneous co-integration of new function with CMOS. IEEE Journal on Emerging and Selected Topics in Circuits and Systems 2 (2012), 714--722. DOI:https://doi.org/10.1109/JETCAS.2012.2223593Google ScholarCross Ref
- B. Rajendran et al. 2007. Low thermal budget processing for sequential 3-D IC fabrication. IEEE Transactions on Electron Devices 54 (2007), 707--714. DOI:https://doi.org/10.1109/TED.2007.891300Google ScholarCross Ref
- D. Lee, S. Das, J. R. Doppa, P. P. Pande, and K. Chakrabarty. 2019. Impact of electrostatic coupling on monolithic 3D-enabled network on chip. ACM Transactions on Design Automation of Electronic Systems 24, 6 (2019), Article 62, 22 pages. DOI:https://doi.org/10.1145/3357158 Google ScholarDigital Library
Index Terms
- HeM3D: Heterogeneous Manycore Architecture Based on Monolithic 3D Vertical Integration
Recommendations
Channel hot-carrier degradation in pMOS and nMOS short channel transistors with high-k dielectric stack
A comparison between pMOS and nMOS short channel transistors with high-k dielectric subjected to channel hot-carrier (CHC) stress is presented. Smaller CHC degradation is observed in pMOS devices. At high temperature, the CHC degradation increases for ...
Investigation of determinant factors of minimum operating voltage of logic gates in 65-nm CMOS
ISLPED '11: Proceedings of the 17th IEEE/ACM international symposium on Low-power electronics and designDeterminant factors of the minimum operating voltage (VDDmin) of CMOS logic gates are investigated by measurements of logic-gate chains in 65nm CMOS. VDDmin consists of a systematic component (VDDmin(SYS)) and a random variation component (VDDmin(RAND))...
Effect of lateral straggle parameter on Hetero Junction Dual Gate Vertical TFET
AbstractIn this Article, the effects of lateral straggle parameter variation and Temperature variation have been investigated on Hetero Junction Dual Gate Vertical TFET. Although the TFET is a viable alternative to the MOSFET, the performance ...
Comments