skip to main content
research-article
Open Access

Improving Resource Efficiency at Scale with Heracles

Published:05 May 2016Publication History
Skip Abstract Section

Abstract

User-facing, latency-sensitive services, such as websearch, underutilize their computing resources during daily periods of low traffic. Reusing those resources for other tasks is rarely done in production services since the contention for shared resources can cause latency spikes that violate the service-level objectives of latency-sensitive tasks. The resulting under-utilization hurts both the affordability and energy efficiency of large-scale datacenters. With the slowdown in technology scaling caused by the sunsetting of Moore’s law, it becomes important to address this opportunity.

We present Heracles, a feedback-based controller that enables the safe colocation of best-effort tasks alongside a latency-critical service. Heracles dynamically manages multiple hardware and software isolation mechanisms, such as CPU, memory, and network isolation, to ensure that the latency-sensitive job meets latency targets while maximizing the resources given to best-effort tasks. We evaluate Heracles using production latency-critical and batch workloads from Google and demonstrate average server utilizations of 90% without latency violations across all the load and colocation scenarios that we evaluated.

References

  1. Mohammad Al-Fares, Alexander Loukissas, and Amin Vahdat. 2008. A scalable, commodity data center network architecture. In Proceedings of the ACM SIGCOMM 2008 Conference on Data Communication (SIGCOMM’08). ACM, New York, NY. DOI:http://dx.doi.org/10.1145/1402958.1402967 Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Mohammad Alizadeh, Albert Greenberg, David A. Maltz, Jitendra Padhye, Parveen Patel, Balaji Prabhakar, Sudipta Sengupta, and Murari Sridharan. 2010. Data center TCP (DCTCP). In Proceedings of the ACM SIGCOMM 2010 Conference (SIGCOMM’10). ACM, New York, NY. DOI:http://dx.doi.org/10.1145/ 1851182.1851192 Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Luiz Barroso and Urs Hölzle. 2007. The case for energy-proportional computing. Computer 40, 12 (Dec. 2007). Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Luiz André Barroso, Jimmy Clidaras, and Urs Holzle. 2013. The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines (2nd ed.). Morgan & Claypool. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Adam Belay, George Prekas, Ana Klimovic, Samuel Grossman, Christos Kozyrakis, and Edouard Bugnion. 2014. IX: A protected dataplane operating system for high throughput and low latency. In Proceedings of the 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14). USENIX Association, Broomfield, CO. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Sergey Blagodurov, Sergey Zhuravlev, Mohammad Dashti, and Alexandra Fedorova. 2011. A case for NUMA-aware contention management on multicore systems. In Proceedings of the 2011 USENIX Conference on USENIX Annual Technical Conference (USENIXATC’11). USENIX Association, Berkeley, CA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Eric Boutin, Jaliya Ekanayake, Wei Lin, Bing Shi, Jingren Zhou, Zhengping Qian, Ming Wu, and Lidong Zhou. 2014. Apollo: Scalable and coordinated scheduling for cloud-scale computing. In Proceedings of the 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14). USENIX Association, Broomfield, CO. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Stephen Boyd and Lieven Vandenberghe. 2004. Convex Optimization. Cambridge University Press, Cambridge. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Bob Briscoe. 2007. Flow rate fairness: Dismantling a religion. SIGCOMM Comput. Commun. Rev. 37, 2 (March 2007). DOI:http://dx.doi.org/10.1145/1232919.1232926 Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Martin A. Brown. 2006. Traffic Control HOWTO. Retrieved from http://linux-ip.net/articles/Traffic-Control-HOWTO/.Google ScholarGoogle Scholar
  11. Marcus Carvalho, Walfredo Cirne, Francisco Brasileiro, and John Wilkes. 2014. Long-term SLOs for reclaimed cloud computing resources. In Proceedings of SOCC. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Henry Cook, Miquel Moreto, Sarah Bird, Khanh Dao, David A. Patterson, and Krste Asanovic. 2013. A hardware evaluation of cache partitioning to improve utilization and energy-efficiency while preserving responsiveness. In Proceedings of the 40th Annual International Symposium on Computer Architecture (ISCA’13). ACM, New York, NY. DOI:http://dx.doi.org/10.1145/2485922.2485949 Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Carlo Curino, Djellel E. Difallah, Chris Douglas, Subru Krishnan, Raghu Ramakrishnan, and Sriram Rao. 2014. Reservation-based scheduling: If you’re late don’t blame us!. In Proceedings of the 5th Annual Symposium on Cloud Computing. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Jeffrey Dean and Luiz André Barroso. 2013. The tail at scale. Commun. ACM 56, 2 (Feb. 2013). Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Christina Delimitrou and Christos Kozyrakis. 2013. Paragon: QoS-aware scheduling for heterogeneous datacenters. In Proceedings of the 18th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). Houston, TX. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Christina Delimitrou and Christos Kozyrakis. 2014. Quasar: Resource-efficient and QoS-aware cluster management. In Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). Salt Lake City, UT. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Eiman Ebrahimi, Chang Joo Lee, Onur Mutlu, and Yale N. Patt. 2010. Fairness via source throttling: A configurable and high-performance fairness substrate for multi-core memory systems. In Proceedings of the Fifteenth Edition of ASPLOS on Architectural Support for Programming Languages and Operating Systems (ASPLOS XV). ACM, New York, NY. DOI:http://dx.doi.org/10.1145/1736020.1736058 Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. H. Esmaeilzadeh, E. Blem, R. St. Amant, K. Sankaralingam, and D. Burger. 2011. Dark silicon and the end of multicore scaling. In Proceedings of the 2011 38th Annual International Symposium on Computer Architecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Sriram Govindan, Jie Liu, Aman Kansal, and Anand Sivasubramaniam. 2011. Cuanta: Quantifying effects of shared on-chip resource interference for consolidated virtual machines. In Proceedings of the 2nd ACM Symposium on Cloud Computing. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Donald Gross. 2008. Fundamentals of Queueing Theory. John Wiley & Sons, New York NY. Google ScholarGoogle Scholar
  21. Fei Guo, Hari Kannan, Li Zhao, Ramesh Illikkal, Ravi Iyer, Don Newell, Yan Solihin, and Christos Kozyrakis. 2007a. From chaos to QoS: Case studies in CMP resource management. SIGARCH Comput. Arch. News 35, 1 (March 2007). DOI:http://dx.doi.org/10.1145/1241601.1241608 Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Fei Guo, Yan Solihin, Li Zhao, and Ravishankar Iyer. 2007b. A framework for providing quality of service in chip multi-processors. In Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 40). IEEE Computer Society, Washington, DC. DOI:http://dx.doi.org/10.1109/MICRO.2007.6 Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Nikos Hardavellas, Michael Ferdman, Babak Falsafi, and Anastasia Ailamaki. 2011. Toward dark silicon in servers. IEEE Micro 31, 4 (2011). DOI:http://dx.doi.org/10.1109/MM.2011.77 Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Lisa R. Hsu, Steven K. Reinhardt, Ravishankar Iyer, and Srihari Makineni. 2006. Communist, utilitarian, and capitalist cache policies on CMPs: Caches as a shared resource. In Proceedings of the 15th International Conference on Parallel Architectures and Compilation Techniques (PACT’06). ACM, New York, NY. DOI:http://dx.doi.org/10.1145/1152154.1152161 Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Intel. 2003. Serial ATA II Native Command Queuing Overview. Retrieved from http://download.intel.com/ support/chipsets/imsm/sb/sata2_ncq_overview.pdf.Google ScholarGoogle Scholar
  26. Intel. 2014. Intel®64 and IA-32 architectures software developer’s manual. 3B: System Programming Guide, Part 2 (Sep 2014).Google ScholarGoogle Scholar
  27. iperf. 2011. Iperf - The TCP/UDP Bandwidth Measurement Tool. Retrieved from https://iperf.fr/.Google ScholarGoogle Scholar
  28. Teerawat Issariyakul and Ekram Hossain. 2010. Introduction to Network Simulator NS2 (1st ed.). Springer. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Ravi Iyer. 2004. CQoS: A framework for enabling QoS in shared caches of CMP platforms. In Proceedings of the 18th Annual International Conference on Supercomputing (ICS’04). ACM, New York, NY. DOI:http://dx.doi.org/10.1145/1006209.1006246 Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Ravi Iyer, Li Zhao, Fei Guo, Ramesh Illikkal, Srihari Makineni, Don Newell, Yan Solihin, Lisa Hsu, and Steve Reinhardt. 2007. QoS policies and architecture for cache/memory in CMP platforms. In Proceeding of the 2007 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS’07). ACM, New York, NY. DOI:http://dx.doi.org/10.1145/1254882.1254886 Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Vijay Janapa Reddi, Benjamin C. Lee, Trishul Chilimbi, and Kushagra Vaid. 2010. Web search using mobile cores: Quantifying and mitigating the price of efficiency. SIGARCH Comput. Arch. News 38, 3 (June 2010). DOI:http://dx.doi.org/10.1145/ 1816038.1816002 Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Min Kyu Jeong, Mattan Erez, Chander Sudanthi, and Nigel Paver. 2012. A QoS-aware memory controller for dynamically balancing GPU and CPU bandwidth use in an MPSoC. In Proceeding of the 49th Annual Design Automation Conference (DAC’12). ACM, New York, NY. DOI:http://dx.doi.org/10.1145/2228360.2228513 Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Vimalkumar Jeyakumar, Mohammad Alizadeh, David Mazières, Balaji Prabhakar, Changhoon Kim, and Albert Greenberg. 2013. EyeQ: Practical network performance isolation at the edge. In Proceeding of the 10th USENIX Conference on Networked Systems Design and Implementation. USENIX Association, Berkeley, CA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Svilen Kanev, Kim Hazelwood, Gu-Yeon Wei, and David Brooks. 2014. Tradeoffs between power management and tail latency in warehouse-scale applications. In IISWC.Google ScholarGoogle Scholar
  35. Rishi Kapoor, George Porter, Malveeka Tewari, Geoffrey M. Voelker, and Amin Vahdat. 2012. Chronos: Predictable low latency for data center applications. In Proceeding of the 3rd ACM Symposium on Cloud Computing (SoCC’12). ACM, New York, NY, Article 9. DOI:http://dx.doi.org/10.1145/2391229.2391238 Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Harshad Kasture and Daniel Sanchez. 2014. Ubik: Efficient cache sharing with strict QoS for latency-critical workloads. In Proceeding of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-XIX). Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Wonyoung Kim, M. S. Gupta, Gu-Yeon Wei, and D. Brooks. 2008. System level analysis of fast, per-core DVFS using on-chip switching regulators. In Proceeding of the IEEE 14th International Symposium on High Performance Computer Architecture, 2008 (HPCA’08). DOI:http://dx.doi.org/10.1109/ HPCA.2008.4658633Google ScholarGoogle Scholar
  38. Quoc Le, Marc’Aurelio Ranzato, Rajat Monga, Matthieu Devin, Kai Chen, Greg Corrado, Jeff Dean, and Andrew Ng. 2012. Building high-level features using large scale unsupervised learning. In Proceeding of the International Conference in Machine Learning.Google ScholarGoogle Scholar
  39. Jacob Leverich and Christos Kozyrakis. 2014. Reconciling high server utilization and sub-millisecond quality-of-service. In Proceeding of the SIGOPS European Conference on Computer Systems (EuroSys). Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Bin Li, Li Zhao, Ravi Iyer, Li-Shiuan Peh, Michael Leddige, Michael Espig, Seung Eun Lee, and Donald Newell. 2011. CoQoS: Coordinating QoS-aware shared resources in NoC-based SoCs. J. Parallel Distrib. Comput. 71, 5 (May 2011). DOI:http://dx.doi.org/10.1016/j.jpdc.2010.10.013 Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Kevin Lim, David Meisner, Ali G. Saidi, Parthasarathy Ranganathan, and Thomas F. Wenisch. 2013. Thin servers with smart pipes: Designing SoC accelerators for memcached. In Proceeding of the 40th Annual International Symposium on Computer Architecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Kevin Lim, Yoshio Turner, Jose Renato Santos, Alvin AuYoung, Jichuan Chang, Parthasarathy Ranganathan, and Thomas F. Wenisch. 2012. System-level implications of disaggregated memory. In Proceeding of the 2012 IEEE 18th International Symposium on High-Performance Computer Architecture (HPCA’12). IEEE Computer Society, Washington, DC. DOI:http://dx.doi.org/10.1109/HPCA.2012.6168955 Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Jiang Lin, Qingda Lu, Xiaoning Ding, Zhao Zhang, Xiaodong Zhang, and P. Sadayappan. 2008. Gaining insights into multicore cache partitioning: Bridging the gap between simulation and real systems. In Proceeding of the IEEE 14th International Symposium on High Performance Computer Architecture, 2008 (HPCA’08). DOI:http://dx.doi.org/10.1109/HPCA.2008.4658653Google ScholarGoogle Scholar
  44. Huan Liu. 2011. A measurement study of server utilization in public clouds. In Proceeding of the 2011 IEEE 9th International Conference on Dependable, Autonomic and Secure Computing (DASC). Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Rose Liu, Kevin Klues, Sarah Bird, Steven Hofmeyr, Krste Asanović, and John Kubiatowicz. 2009. Tessellation: Space-time partitioning in a manycore client OS. In Proceedings of the 1st USENIX Conference on Hot Topics in Parallelism (HotPar’09). USENIX Association, Berkeley, CA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Yanpei Liu, Stark C. Draper, and Nam Sung Kim. 2014. SleepScale: Runtime joint speed scaling and sleep states management for power efficient data centers. In Proceeding of the 41st Annual International Symposium on Computer Architecuture (ISCA’14). IEEE Press, Piscataway, NJ. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. David Lo, Liqun Cheng, Rama Govindaraju, Luiz André Barroso, and Christos Kozyrakis. 2014. Towards energy proportionality for large-scale latency-critical workloads. In Proceeding of the 41st Annual International Symposium on Computer Architecuture (ISCA’14). IEEE Press, Piscataway, NJ. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. David Lo, Liqun Cheng, Rama Govindaraju, Parthasarathy Ranganathan, and Christos Kozyrakis. 2015. Heracles: Improving resource efficiency at scale. In Proceedings of the 42nd Annual International Symposium on Computer Architecture (ISCA’15). ACM, New York, NY. DOI:http://dx.doi.org/10.1145/ 2749469.2749475 Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Krishna T. Malladi, Benjamin C. Lee, Frank A. Nothaft, Christos Kozyrakis, Karthika Periyathambi, and Mark Horowitz. 2012. Towards energy-proportional datacenter memory with mobile DRAM. SIGARCH Comput. Arch. News 40, 3 (June 2012). DOI:http://dx.doi.org/10.1145/2366231.2337164 Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. R. Manikantan, Kaushik Rajan, and R. Govindarajan. 2012. Probabilistic shared cache management (PriSM). In Proceedings of the 39th Annual International Symposium on Computer Architecture (ISCA’12). IEEE Computer Society, Washington, DC. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Jason Mars, Lingjia Tang, Robert Hundt, Kevin Skadron, and Mary Lou Soffa. 2011. Bubble-Up: Increasing utilization in modern warehouse scale computers via sensible co-locations. In Proceedings of the 44th Annual IEEE/ACM Intl. Symp. on Microarchitecture (MICRO-44’11). Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. J. Mars, Lingjia Tang, K. Skadron, M. L. Soffa, and R. Hundt. 2012. Increasing utilization in modern warehouse-scale computers using bubble-up. IEEE Micro. 32, 3 (May 2012). DOI:http://dx.doi.org/ 10.1109/MM.2012.22 Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Paul Marshall, Kate Keahey, and Tim Freeman. 2011. Improving utilization of infrastructure clouds. In Proceedings of the 2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. McKinsey & Company. 2008. Revolutionizing data center efficiency. In Proceedings of the Uptime Institute Symposium.Google ScholarGoogle Scholar
  55. David Meisner, Brian T. Gold, and Thomas F. Wenisch. 2009. PowerNap: Eliminating server idle power. In Proceedings of the 14th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS XIV). Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. David Meisner, Christopher M. Sadler, Luiz Andr Barroso, Wolf-Dietrich Weber, and Thomas F. Wenisch. 2011. Power management of online data-intensive services. In Proceedings of the 38th ACM Intl. Symp. on Computer Architecture. ACM, New York, NY. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Paul Menage. 2007. CGROUPS. Retrieved from https://www.kernel.org/doc/Documentation/cgroup-v1/ cgroups.txt.Google ScholarGoogle Scholar
  58. Sai Prashanth Muralidhara, Lavanya Subramanian, Onur Mutlu, Mahmut Kandemir, and Thomas Moscibroda. 2011. Reducing memory interference in multicore systems via application-aware memory channel partitioning. In Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-44). ACM, New York, NY. DOI:http://dx.doi.org/10.1145/2155620.2155664 Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Vijay Nagarajan and Rajiv Gupta. 2009. ECMon: Exposing cache events for monitoring. In Proceedings of the 36th Annual International Symposium on Computer Architecture (ISCA’09). ACM, New York, NY. DOI:http://dx.doi.org/10.1145/1555754.1555798 Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. R. Nathuji, A. Kansal, and A. Ghaffarkhah. 2010. Q-clouds: Managing performance interference effects for QoS-aware clouds. In Proceedings of EuroSys, France. Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. K. J. Nesbit, Nidhi Aggarwal, J. Laudon, and J. E. Smith. 2006. Fair queuing memory systems. In Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture, 2006 (MICRO-39). DOI:http://dx.doi.org/10.1109/MICRO.2006.24 Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. Rajesh Nishtala, Hans Fugal, Steven Grimm, Marc Kwiatkowski, Herman Lee, Harry C. Li, Ryan McElroy, Mike Paleczny, Daniel Peek, Paul Saab, David Stafford, Tony Tung, and Venkateshwaran Venkataramani. 2013. Scaling memcache at facebook. In Presented as part of the 10th USENIX Symposium on Networked Systems Design and Implementation (NSDI 13). USENIX, Lombard, IL, 385--398. https://www.usenix.org/conference/nsdi13/technical-sessions/presentation/nishtala. Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Dejan Novakovic, Nedeljko Vasic, Stanko Novakovic, Dejan Kostic, and Ricardo Bianchini. 2013. DeepDive: Transparently identifying and managing performance interference in virtualized environments. In Proc. of the USENIX Annual Technical Conference (ATC’13). Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. W. Pattara-Aukom, S. Banerjee, and P. Krishnamurthy. 2002. Starvation prevention and quality of service in wireless LANs. In The 5th International Symposium on Wireless Personal Multimedia Communications, 2002, Vol. 3. DOI:http://dx.doi.org/10.1109/WPMC.2002.1088344Google ScholarGoogle ScholarCross RefCross Ref
  65. M. Podlesny and C. Williamson. 2012. Solving the TCP-incast problem with application-level scheduling. In Proceedings of the 2012 IEEE 20th International Symposium on Modeling, Analysis Simulation of Computer and Telecommunication Systems (MASCOTS). IEEE Press, Piscataway, NJ. DOI:http://dx.doi.org/10.1109/MASCOTS.2012.21 Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. Andrew Putnam, Adrian M. Caulfield, Eric S. Chung, Derek Chiou, Kypros Constantinides, John Demme, Hadi Esmaeilzadeh, Jeremy Fowers, Gopi Prashanth, Gopal Jan, Gray Michael, Haselman Scott Hauck, Stephen Heil, Amir Hormati, Joo-Young Kim, Sitaram Lanka, James Larus, Eric Peterson, Simon Pope, Aaron Smith, Jason Thong, Phillip Yi, and Xiao Doug Burger. 2014. A reconfigurable fabric for accelerating large-scale datacenter services. In Proceeding of the 41st Annual International Symposium on Computer Architecuture (ISCA’14). IEEE Press, Piscataway, NJ. Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. M. K. Qureshi and Y. N. Patt. 2006. Utility-based cache partitioning: A low-overhead, high-performance, runtime mechanism to partition shared caches. In Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture. DOI:http://dx.doi.org/10.1109/MICRO.2006.49 Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. Parthasarathy Ranganathan, Sarita Adve, and Norman P. Jouppi. 2000. Reconfigurable caches and their application to media processing. In Proceedings of the 27th Annual International Symposium on Computer Architecture (ISCA’00). ACM, New York, NY. DOI:http://dx.doi.org/10.1145/339647.339685 Google ScholarGoogle ScholarDigital LibraryDigital Library
  69. Charles Reiss, Alexey Tumanov, Gregory R. Ganger, Randy H. Katz, and Michael A. Kozuch. 2012. Heterogeneity and dynamicity of clouds at scale: Google trace analysis. In Proceedings of the ACM Symposium on Cloud Computing (SoCC). ACM, New York, NY. Google ScholarGoogle ScholarDigital LibraryDigital Library
  70. Chuck Rosenberg. 2013. Improving Photo Search: A Step Across the Semantic Gap. Retrieved from http://googleresearch.blogspot.com/2013/06/improving-photo-search-step-across.html.Google ScholarGoogle Scholar
  71. Daniel Sanchez and Christos Kozyrakis. 2011. Vantage: Scalable and efficient fine-grain cache partitioning. SIGARCH Comput. Archit. News 39, 3 (June 2011). DOI:http://dx.doi.org/10.1145/2024723.2000073 Google ScholarGoogle ScholarDigital LibraryDigital Library
  72. Yoon Jae Seong, Eyec Hyun Nam, Jin Hyuk Yoon, Hongseok Kim, Jin yong Choi, Sookwan Lee, Young Hyun Bae, Jaejin Lee, Yookun Cho, and Sang Lyul Min. 2010. Hydra: A block-mapped parallel flash memory solid-state disk architecture. IEEE Trans. Comput. 59, 7 (July 2010). DOI:http://dx.doi.org/10.1109/TC.2010.63 Google ScholarGoogle ScholarDigital LibraryDigital Library
  73. Akbar Sharifi, Shekhar Srikantaiah, Asit K. Mishra, Mahmut Kandemir, and Chita R. Das. 2011. METE: Meeting end-to-end qos in multicores through system-wide resource management. In Proceedings of the ACM SIGMETRICS Joint International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS’11). ACM, New York, NY. DOI:http://dx.doi.org/10.1145/1993744.1993747 Google ScholarGoogle ScholarDigital LibraryDigital Library
  74. Shekhar Srikantaiah, Mahmut Kandemir, and Qian Wang. 2009. SHARP control: Controlled shared cache management in chip multiprocessors. In Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 42). ACM, New York, NY. DOI:http://dx.doi.org/10.1145/1669112.1669177 Google ScholarGoogle ScholarDigital LibraryDigital Library
  75. Shingo Tanaka and Christos Kozyrakis. 2014. High performance hardware-accelerated flash key-value store. In Proceedings of the 2014 Non-volatile Memories Workshop (NVMW).Google ScholarGoogle Scholar
  76. Lingjia Tang, J. Mars, N. Vachharajani, R. Hundt, and M. L. Soffa. 2011. The impact of memory subsystem resource sharing on datacenter applications. In Proceedings of the 2011 38th Annual International Symposium on Computer Architecture (ISCA). Google ScholarGoogle ScholarDigital LibraryDigital Library
  77. Eno Thereska, Hitesh Ballani, Greg O’Shea, Thomas Karagiannis, Antony Rowstron, Tom Talpey, Richard Black, and Timothy Zhu. 2013. IOFlow: A software-defined storage architecture. In Proceedings of the 24th ACM Symposium on Operating Systems Principles (SOSP’13). ACM, New York, NY, 182--196. DOI:http://dx.doi.org/10.1145/2517349.2522723 Google ScholarGoogle ScholarDigital LibraryDigital Library
  78. Arunchandar Vasan, Anand Sivasubramaniam, Vikrant Shimpi, T. Sivabalan, and Rajesh Subbiah. 2010. Worth their watts? An empirical study of datacenter servers. In Proceedings of the International Symposium on High-Performance Computer Architecture.Google ScholarGoogle ScholarCross RefCross Ref
  79. Nedeljko Vasić, Dejan Novaković, Svetozar Miučin, Dejan Kostić, and Ricardo Bianchini. 2012. DejaVu: Accelerating resource allocation in virtualized environments. In Proceedings of the 17th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). London, UK. Google ScholarGoogle ScholarDigital LibraryDigital Library
  80. Ben Verghese, Anoop Gupta, and Mendel Rosenblum. 1998. Performance isolation: Sharing and isolation in shared-memory multiprocessors. In Proceedings of the 8th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS VIII). ACM, New York, NY, 181--192. DOI:http://dx.doi.org/10.1145/291069.291044 Google ScholarGoogle ScholarDigital LibraryDigital Library
  81. Christo Wilson, Hitesh Ballani, Thomas Karagiannis, and Ant Rowtron. 2011. Better never than late: Meeting deadlines in datacenter networks. In Proceedings of the ACM SIGCOMM 2011 Conference (SIGCOMM’11). ACM, New York, NY. DOI:http://dx.doi.org/10.1145/2018436.2018443 Google ScholarGoogle ScholarDigital LibraryDigital Library
  82. Carole-Jean Wu and Margaret Martonosi. 2008. A comparison of capacity management schemes for shared CMP caches. In Proceedings of the 7th Workshop on Duplicating, Deconstructing, and Debunking, Vol. 15. Citeseer.Google ScholarGoogle Scholar
  83. Yuejian Xie and Gabriel H. Loh. 2009. PIPP: Promotion/insertion pseudo-partitioning of multi-core shared caches. In Proceedings of the 36th Annual International Symposium on Computer Architecture (ISCA’09). ACM, New York, NY. DOI:http://dx.doi.org/10.1145/1555754.1555778 Google ScholarGoogle ScholarDigital LibraryDigital Library
  84. Hailong Yang, Alex Breslow, Jason Mars, and Lingjia Tang. 2013. Bubble-flux: Precise online QoS management for increased utilization in warehouse scale computers. In Proceedings of the 40th Annual International Symposium on Computer Architecture (ISCA’13). Google ScholarGoogle ScholarDigital LibraryDigital Library
  85. A. Yasin. 2014. A top-down method for performance analysis and counters architecture. In Proceedings of the 2014 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). 35--44.Google ScholarGoogle ScholarCross RefCross Ref
  86. Xiao Zhang, Eric Tune, Robert Hagmann, Rohit Jnagal, Vrigo Gokhale, and John Wilkes. 2013. CPI2: CPU performance isolation for shared compute clusters. In Proceedings of the 8th ACM European Conference on Computer Systems (EuroSys). Prague, Czech Republic. Google ScholarGoogle ScholarDigital LibraryDigital Library
  87. Yunqi Zhang, Michael A. Laurenzano, Jason Mars, and Lingjia Tang. 2014. SMiTe: Precise QoS prediction on real-system SMT processors to improve utilization in warehouse scale computers. In Proceedings of the International Symposium on Microarchitecture (MICRO). Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Improving Resource Efficiency at Scale with Heracles

    Recommendations

    Reviews

    Bayard Kohlhepp

    Most of the paper's authors are connected to Google, and their work centers on Google workload performance improvement. They've developed runtime controller software, Heracles, that uses real-time feedback and static modeling rules to adjust resource allocation within servers in order to meet service-level objectives (SLOs). The paper's closing section demonstrates that Heracles improved performance in test systems. It's great that they've made Google faster, but what use is this Google performance tool to the rest of us Unless and until they make Heracles freely downloadable (and we have server applications that can make use of it), the tool itself is of no general interest. The value of this paper, though, lies not in the end product, but in the journey that led to the end product. The first nine or ten pages describe the authors' analysis of resource contention, specifically the interplay between latency critical (LC) tasks and noncritical, best-effort (BE) tasks. All applications, from Internet of Things (IoT) to the cloud, on smartphones and in data center servers, face the problem of guaranteeing quick response from critical services despite the unpredictable activity of background tasks. At present, we solve the problem by over allocating resources. We throw money at the problem, paying for peak usage scenarios, while day in and day out we tolerate idle central processing units (CPUs) and underutilized storage. The analysis that led to Heracles, summarized in this paper, brings us a step closer to building efficient systems. The authors have created a template we can all use to analyze resource contention. They also identify specific tools and techniques used to address contention issues, quantify performance improvements achieved by using those tools, and survey numerous research contributors for further investigation. The rest of us will probably never use Heracles, but we can all use this advice to improve our own little corner of the universe. Online Computing Reviews Service

    Access critical reviews of Computing literature here

    Become a reviewer for Computing Reviews.

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Computer Systems
      ACM Transactions on Computer Systems  Volume 34, Issue 2
      May 2016
      96 pages
      ISSN:0734-2071
      EISSN:1557-7333
      DOI:10.1145/2912575
      Issue’s Table of Contents

      Copyright © 2016 Owner/Author

      Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 5 May 2016
      • Accepted: 1 January 2016
      • Received: 1 October 2015
      Published in tocs Volume 34, Issue 2

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader