research-article

Open Access

Improving Resource Efficiency at Scale with Heracles

Authors:
David Lo

Google Inc., Stanford University, Mountain View, CA

Google Inc., Stanford University, Mountain View, CA

0000-0002-2585-5473
View Profile

,
Liqun Cheng

Google Inc., Mountain View, CA

Google Inc., Mountain View, CA
View Profile

,
Rama Govindaraju

Google Inc., Mountain View, CA

Google Inc., Mountain View, CA
View Profile

,
Parthasarathy Ranganathan

Google Inc., Mountain View, CA

Google Inc., Mountain View, CA
View Profile

,
Christos Kozyrakis

Stanford University, Stanford, CA

Stanford University, Stanford, CA
View Profile

Authors Info & Claims

ACM Transactions on Computer Systems Volume 34 Issue 2Article No.: 6pp 1–33https://doi.org/10.1145/2882783

Published:05 May 2016Publication History

ACM Transactions on Computer Systems

Abstract

User-facing, latency-sensitive services, such as websearch, underutilize their computing resources during daily periods of low traffic. Reusing those resources for other tasks is rarely done in production services since the contention for shared resources can cause latency spikes that violate the service-level objectives of latency-sensitive tasks. The resulting under-utilization hurts both the affordability and energy efficiency of large-scale datacenters. With the slowdown in technology scaling caused by the sunsetting of Moore’s law, it becomes important to address this opportunity.

We present Heracles, a feedback-based controller that enables the safe colocation of best-effort tasks alongside a latency-critical service. Heracles dynamically manages multiple hardware and software isolation mechanisms, such as CPU, memory, and network isolation, to ensure that the latency-sensitive job meets latency targets while maximizing the resources given to best-effort tasks. We evaluate Heracles using production latency-critical and batch workloads from Google and demonstrate average server utilizations of 90% without latency violations across all the load and colocation scenarios that we evaluated.

References

Mohammad Al-Fares, Alexander Loukissas, and Amin Vahdat. 2008. A scalable, commodity data center network architecture. In Proceedings of the ACM SIGCOMM 2008 Conference on Data Communication (SIGCOMM’08). ACM, New York, NY. DOI:http://dx.doi.org/10.1145/1402958.1402967 Google ScholarDigital Library
Mohammad Alizadeh, Albert Greenberg, David A. Maltz, Jitendra Padhye, Parveen Patel, Balaji Prabhakar, Sudipta Sengupta, and Murari Sridharan. 2010. Data center TCP (DCTCP). In Proceedings of the ACM SIGCOMM 2010 Conference (SIGCOMM’10). ACM, New York, NY. DOI:http://dx.doi.org/10.1145/ 1851182.1851192 Google ScholarDigital Library
Luiz Barroso and Urs Hölzle. 2007. The case for energy-proportional computing. Computer 40, 12 (Dec. 2007). Google ScholarDigital Library
Luiz André Barroso, Jimmy Clidaras, and Urs Holzle. 2013. The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines (2nd ed.). Morgan & Claypool. Google ScholarDigital Library
Adam Belay, George Prekas, Ana Klimovic, Samuel Grossman, Christos Kozyrakis, and Edouard Bugnion. 2014. IX: A protected dataplane operating system for high throughput and low latency. In Proceedings of the 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14). USENIX Association, Broomfield, CO. Google ScholarDigital Library
Sergey Blagodurov, Sergey Zhuravlev, Mohammad Dashti, and Alexandra Fedorova. 2011. A case for NUMA-aware contention management on multicore systems. In Proceedings of the 2011 USENIX Conference on USENIX Annual Technical Conference (USENIXATC’11). USENIX Association, Berkeley, CA. Google ScholarDigital Library
Eric Boutin, Jaliya Ekanayake, Wei Lin, Bing Shi, Jingren Zhou, Zhengping Qian, Ming Wu, and Lidong Zhou. 2014. Apollo: Scalable and coordinated scheduling for cloud-scale computing. In Proceedings of the 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14). USENIX Association, Broomfield, CO. Google ScholarDigital Library
Stephen Boyd and Lieven Vandenberghe. 2004. Convex Optimization. Cambridge University Press, Cambridge. Google ScholarDigital Library
Bob Briscoe. 2007. Flow rate fairness: Dismantling a religion. SIGCOMM Comput. Commun. Rev. 37, 2 (March 2007). DOI:http://dx.doi.org/10.1145/1232919.1232926 Google ScholarDigital Library
Martin A. Brown. 2006. Traffic Control HOWTO. Retrieved from http://linux-ip.net/articles/Traffic-Control-HOWTO/.Google Scholar
Marcus Carvalho, Walfredo Cirne, Francisco Brasileiro, and John Wilkes. 2014. Long-term SLOs for reclaimed cloud computing resources. In Proceedings of SOCC. Google ScholarDigital Library
Henry Cook, Miquel Moreto, Sarah Bird, Khanh Dao, David A. Patterson, and Krste Asanovic. 2013. A hardware evaluation of cache partitioning to improve utilization and energy-efficiency while preserving responsiveness. In Proceedings of the 40th Annual International Symposium on Computer Architecture (ISCA’13). ACM, New York, NY. DOI:http://dx.doi.org/10.1145/2485922.2485949 Google ScholarDigital Library
Carlo Curino, Djellel E. Difallah, Chris Douglas, Subru Krishnan, Raghu Ramakrishnan, and Sriram Rao. 2014. Reservation-based scheduling: If you’re late don’t blame us!. In Proceedings of the 5th Annual Symposium on Cloud Computing. Google ScholarDigital Library
Jeffrey Dean and Luiz André Barroso. 2013. The tail at scale. Commun. ACM 56, 2 (Feb. 2013). Google ScholarDigital Library
Christina Delimitrou and Christos Kozyrakis. 2013. Paragon: QoS-aware scheduling for heterogeneous datacenters. In Proceedings of the 18th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). Houston, TX. Google ScholarDigital Library
Christina Delimitrou and Christos Kozyrakis. 2014. Quasar: Resource-efficient and QoS-aware cluster management. In Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). Salt Lake City, UT. Google ScholarDigital Library
Eiman Ebrahimi, Chang Joo Lee, Onur Mutlu, and Yale N. Patt. 2010. Fairness via source throttling: A configurable and high-performance fairness substrate for multi-core memory systems. In Proceedings of the Fifteenth Edition of ASPLOS on Architectural Support for Programming Languages and Operating Systems (ASPLOS XV). ACM, New York, NY. DOI:http://dx.doi.org/10.1145/1736020.1736058 Google ScholarDigital Library
H. Esmaeilzadeh, E. Blem, R. St. Amant, K. Sankaralingam, and D. Burger. 2011. Dark silicon and the end of multicore scaling. In Proceedings of the 2011 38th Annual International Symposium on Computer Architecture. Google ScholarDigital Library
Sriram Govindan, Jie Liu, Aman Kansal, and Anand Sivasubramaniam. 2011. Cuanta: Quantifying effects of shared on-chip resource interference for consolidated virtual machines. In Proceedings of the 2nd ACM Symposium on Cloud Computing. Google ScholarDigital Library
Donald Gross. 2008. Fundamentals of Queueing Theory. John Wiley & Sons, New York NY. Google Scholar
Fei Guo, Hari Kannan, Li Zhao, Ramesh Illikkal, Ravi Iyer, Don Newell, Yan Solihin, and Christos Kozyrakis. 2007a. From chaos to QoS: Case studies in CMP resource management. SIGARCH Comput. Arch. News 35, 1 (March 2007). DOI:http://dx.doi.org/10.1145/1241601.1241608 Google ScholarDigital Library
Fei Guo, Yan Solihin, Li Zhao, and Ravishankar Iyer. 2007b. A framework for providing quality of service in chip multi-processors. In Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 40). IEEE Computer Society, Washington, DC. DOI:http://dx.doi.org/10.1109/MICRO.2007.6 Google ScholarDigital Library
Nikos Hardavellas, Michael Ferdman, Babak Falsafi, and Anastasia Ailamaki. 2011. Toward dark silicon in servers. IEEE Micro 31, 4 (2011). DOI:http://dx.doi.org/10.1109/MM.2011.77 Google ScholarDigital Library
Lisa R. Hsu, Steven K. Reinhardt, Ravishankar Iyer, and Srihari Makineni. 2006. Communist, utilitarian, and capitalist cache policies on CMPs: Caches as a shared resource. In Proceedings of the 15th International Conference on Parallel Architectures and Compilation Techniques (PACT’06). ACM, New York, NY. DOI:http://dx.doi.org/10.1145/1152154.1152161 Google ScholarDigital Library
Intel. 2003. Serial ATA II Native Command Queuing Overview. Retrieved from http://download.intel.com/ support/chipsets/imsm/sb/sata2_ncq_overview.pdf.Google Scholar
Intel. 2014. Intel®64 and IA-32 architectures software developer’s manual. 3B: System Programming Guide, Part 2 (Sep 2014).Google Scholar
iperf. 2011. Iperf - The TCP/UDP Bandwidth Measurement Tool. Retrieved from https://iperf.fr/.Google Scholar
Teerawat Issariyakul and Ekram Hossain. 2010. Introduction to Network Simulator NS2 (1st ed.). Springer. Google ScholarDigital Library
Ravi Iyer. 2004. CQoS: A framework for enabling QoS in shared caches of CMP platforms. In Proceedings of the 18th Annual International Conference on Supercomputing (ICS’04). ACM, New York, NY. DOI:http://dx.doi.org/10.1145/1006209.1006246 Google ScholarDigital Library
Ravi Iyer, Li Zhao, Fei Guo, Ramesh Illikkal, Srihari Makineni, Don Newell, Yan Solihin, Lisa Hsu, and Steve Reinhardt. 2007. QoS policies and architecture for cache/memory in CMP platforms. In Proceeding of the 2007 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS’07). ACM, New York, NY. DOI:http://dx.doi.org/10.1145/1254882.1254886 Google ScholarDigital Library
Vijay Janapa Reddi, Benjamin C. Lee, Trishul Chilimbi, and Kushagra Vaid. 2010. Web search using mobile cores: Quantifying and mitigating the price of efficiency. SIGARCH Comput. Arch. News 38, 3 (June 2010). DOI:http://dx.doi.org/10.1145/ 1816038.1816002 Google ScholarDigital Library
Min Kyu Jeong, Mattan Erez, Chander Sudanthi, and Nigel Paver. 2012. A QoS-aware memory controller for dynamically balancing GPU and CPU bandwidth use in an MPSoC. In Proceeding of the 49th Annual Design Automation Conference (DAC’12). ACM, New York, NY. DOI:http://dx.doi.org/10.1145/2228360.2228513 Google ScholarDigital Library
Vimalkumar Jeyakumar, Mohammad Alizadeh, David Mazières, Balaji Prabhakar, Changhoon Kim, and Albert Greenberg. 2013. EyeQ: Practical network performance isolation at the edge. In Proceeding of the 10th USENIX Conference on Networked Systems Design and Implementation. USENIX Association, Berkeley, CA. Google ScholarDigital Library
Svilen Kanev, Kim Hazelwood, Gu-Yeon Wei, and David Brooks. 2014. Tradeoffs between power management and tail latency in warehouse-scale applications. In IISWC.Google Scholar
Rishi Kapoor, George Porter, Malveeka Tewari, Geoffrey M. Voelker, and Amin Vahdat. 2012. Chronos: Predictable low latency for data center applications. In Proceeding of the 3rd ACM Symposium on Cloud Computing (SoCC’12). ACM, New York, NY, Article 9. DOI:http://dx.doi.org/10.1145/2391229.2391238 Google ScholarDigital Library
Harshad Kasture and Daniel Sanchez. 2014. Ubik: Efficient cache sharing with strict QoS for latency-critical workloads. In Proceeding of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-XIX). Google ScholarDigital Library
Wonyoung Kim, M. S. Gupta, Gu-Yeon Wei, and D. Brooks. 2008. System level analysis of fast, per-core DVFS using on-chip switching regulators. In Proceeding of the IEEE 14th International Symposium on High Performance Computer Architecture, 2008 (HPCA’08). DOI:http://dx.doi.org/10.1109/ HPCA.2008.4658633Google Scholar
Quoc Le, Marc’Aurelio Ranzato, Rajat Monga, Matthieu Devin, Kai Chen, Greg Corrado, Jeff Dean, and Andrew Ng. 2012. Building high-level features using large scale unsupervised learning. In Proceeding of the International Conference in Machine Learning.Google Scholar
Jacob Leverich and Christos Kozyrakis. 2014. Reconciling high server utilization and sub-millisecond quality-of-service. In Proceeding of the SIGOPS European Conference on Computer Systems (EuroSys). Google ScholarDigital Library
Bin Li, Li Zhao, Ravi Iyer, Li-Shiuan Peh, Michael Leddige, Michael Espig, Seung Eun Lee, and Donald Newell. 2011. CoQoS: Coordinating QoS-aware shared resources in NoC-based SoCs. J. Parallel Distrib. Comput. 71, 5 (May 2011). DOI:http://dx.doi.org/10.1016/j.jpdc.2010.10.013 Google ScholarDigital Library
Kevin Lim, David Meisner, Ali G. Saidi, Parthasarathy Ranganathan, and Thomas F. Wenisch. 2013. Thin servers with smart pipes: Designing SoC accelerators for memcached. In Proceeding of the 40th Annual International Symposium on Computer Architecture. Google ScholarDigital Library
Kevin Lim, Yoshio Turner, Jose Renato Santos, Alvin AuYoung, Jichuan Chang, Parthasarathy Ranganathan, and Thomas F. Wenisch. 2012. System-level implications of disaggregated memory. In Proceeding of the 2012 IEEE 18th International Symposium on High-Performance Computer Architecture (HPCA’12). IEEE Computer Society, Washington, DC. DOI:http://dx.doi.org/10.1109/HPCA.2012.6168955 Google ScholarDigital Library
Jiang Lin, Qingda Lu, Xiaoning Ding, Zhao Zhang, Xiaodong Zhang, and P. Sadayappan. 2008. Gaining insights into multicore cache partitioning: Bridging the gap between simulation and real systems. In Proceeding of the IEEE 14th International Symposium on High Performance Computer Architecture, 2008 (HPCA’08). DOI:http://dx.doi.org/10.1109/HPCA.2008.4658653Google Scholar
Huan Liu. 2011. A measurement study of server utilization in public clouds. In Proceeding of the 2011 IEEE 9th International Conference on Dependable, Autonomic and Secure Computing (DASC). Google ScholarDigital Library
Rose Liu, Kevin Klues, Sarah Bird, Steven Hofmeyr, Krste Asanović, and John Kubiatowicz. 2009. Tessellation: Space-time partitioning in a manycore client OS. In Proceedings of the 1st USENIX Conference on Hot Topics in Parallelism (HotPar’09). USENIX Association, Berkeley, CA. Google ScholarDigital Library
Yanpei Liu, Stark C. Draper, and Nam Sung Kim. 2014. SleepScale: Runtime joint speed scaling and sleep states management for power efficient data centers. In Proceeding of the 41st Annual International Symposium on Computer Architecuture (ISCA’14). IEEE Press, Piscataway, NJ. Google ScholarDigital Library
David Lo, Liqun Cheng, Rama Govindaraju, Luiz André Barroso, and Christos Kozyrakis. 2014. Towards energy proportionality for large-scale latency-critical workloads. In Proceeding of the 41st Annual International Symposium on Computer Architecuture (ISCA’14). IEEE Press, Piscataway, NJ. Google ScholarDigital Library
David Lo, Liqun Cheng, Rama Govindaraju, Parthasarathy Ranganathan, and Christos Kozyrakis. 2015. Heracles: Improving resource efficiency at scale. In Proceedings of the 42nd Annual International Symposium on Computer Architecture (ISCA’15). ACM, New York, NY. DOI:http://dx.doi.org/10.1145/ 2749469.2749475 Google ScholarDigital Library
Krishna T. Malladi, Benjamin C. Lee, Frank A. Nothaft, Christos Kozyrakis, Karthika Periyathambi, and Mark Horowitz. 2012. Towards energy-proportional datacenter memory with mobile DRAM. SIGARCH Comput. Arch. News 40, 3 (June 2012). DOI:http://dx.doi.org/10.1145/2366231.2337164 Google ScholarDigital Library
R. Manikantan, Kaushik Rajan, and R. Govindarajan. 2012. Probabilistic shared cache management (PriSM). In Proceedings of the 39th Annual International Symposium on Computer Architecture (ISCA’12). IEEE Computer Society, Washington, DC. Google ScholarDigital Library
Jason Mars, Lingjia Tang, Robert Hundt, Kevin Skadron, and Mary Lou Soffa. 2011. Bubble-Up: Increasing utilization in modern warehouse scale computers via sensible co-locations. In Proceedings of the 44th Annual IEEE/ACM Intl. Symp. on Microarchitecture (MICRO-44’11). Google ScholarDigital Library
J. Mars, Lingjia Tang, K. Skadron, M. L. Soffa, and R. Hundt. 2012. Increasing utilization in modern warehouse-scale computers using bubble-up. IEEE Micro. 32, 3 (May 2012). DOI:http://dx.doi.org/ 10.1109/MM.2012.22 Google ScholarDigital Library
Paul Marshall, Kate Keahey, and Tim Freeman. 2011. Improving utilization of infrastructure clouds. In Proceedings of the 2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing. Google ScholarDigital Library
McKinsey & Company. 2008. Revolutionizing data center efficiency. In Proceedings of the Uptime Institute Symposium.Google Scholar
David Meisner, Brian T. Gold, and Thomas F. Wenisch. 2009. PowerNap: Eliminating server idle power. In Proceedings of the 14th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS XIV). Google ScholarDigital Library
David Meisner, Christopher M. Sadler, Luiz Andr Barroso, Wolf-Dietrich Weber, and Thomas F. Wenisch. 2011. Power management of online data-intensive services. In Proceedings of the 38th ACM Intl. Symp. on Computer Architecture. ACM, New York, NY. Google ScholarDigital Library
Paul Menage. 2007. CGROUPS. Retrieved from https://www.kernel.org/doc/Documentation/cgroup-v1/ cgroups.txt.Google Scholar
Sai Prashanth Muralidhara, Lavanya Subramanian, Onur Mutlu, Mahmut Kandemir, and Thomas Moscibroda. 2011. Reducing memory interference in multicore systems via application-aware memory channel partitioning. In Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-44). ACM, New York, NY. DOI:http://dx.doi.org/10.1145/2155620.2155664 Google ScholarDigital Library
Vijay Nagarajan and Rajiv Gupta. 2009. ECMon: Exposing cache events for monitoring. In Proceedings of the 36th Annual International Symposium on Computer Architecture (ISCA’09). ACM, New York, NY. DOI:http://dx.doi.org/10.1145/1555754.1555798 Google ScholarDigital Library
R. Nathuji, A. Kansal, and A. Ghaffarkhah. 2010. Q-clouds: Managing performance interference effects for QoS-aware clouds. In Proceedings of EuroSys, France. Google ScholarDigital Library
K. J. Nesbit, Nidhi Aggarwal, J. Laudon, and J. E. Smith. 2006. Fair queuing memory systems. In Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture, 2006 (MICRO-39). DOI:http://dx.doi.org/10.1109/MICRO.2006.24 Google ScholarDigital Library
Rajesh Nishtala, Hans Fugal, Steven Grimm, Marc Kwiatkowski, Herman Lee, Harry C. Li, Ryan McElroy, Mike Paleczny, Daniel Peek, Paul Saab, David Stafford, Tony Tung, and Venkateshwaran Venkataramani. 2013. Scaling memcache at facebook. In Presented as part of the 10th USENIX Symposium on Networked Systems Design and Implementation (NSDI 13). USENIX, Lombard, IL, 385--398. https://www.usenix.org/conference/nsdi13/technical-sessions/presentation/nishtala. Google ScholarDigital Library
Dejan Novakovic, Nedeljko Vasic, Stanko Novakovic, Dejan Kostic, and Ricardo Bianchini. 2013. DeepDive: Transparently identifying and managing performance interference in virtualized environments. In Proc. of the USENIX Annual Technical Conference (ATC’13). Google ScholarDigital Library
W. Pattara-Aukom, S. Banerjee, and P. Krishnamurthy. 2002. Starvation prevention and quality of service in wireless LANs. In The 5th International Symposium on Wireless Personal Multimedia Communications, 2002, Vol. 3. DOI:http://dx.doi.org/10.1109/WPMC.2002.1088344Google ScholarCross Ref
M. Podlesny and C. Williamson. 2012. Solving the TCP-incast problem with application-level scheduling. In Proceedings of the 2012 IEEE 20th International Symposium on Modeling, Analysis Simulation of Computer and Telecommunication Systems (MASCOTS). IEEE Press, Piscataway, NJ. DOI:http://dx.doi.org/10.1109/MASCOTS.2012.21 Google ScholarDigital Library
Andrew Putnam, Adrian M. Caulfield, Eric S. Chung, Derek Chiou, Kypros Constantinides, John Demme, Hadi Esmaeilzadeh, Jeremy Fowers, Gopi Prashanth, Gopal Jan, Gray Michael, Haselman Scott Hauck, Stephen Heil, Amir Hormati, Joo-Young Kim, Sitaram Lanka, James Larus, Eric Peterson, Simon Pope, Aaron Smith, Jason Thong, Phillip Yi, and Xiao Doug Burger. 2014. A reconfigurable fabric for accelerating large-scale datacenter services. In Proceeding of the 41st Annual International Symposium on Computer Architecuture (ISCA’14). IEEE Press, Piscataway, NJ. Google ScholarDigital Library
M. K. Qureshi and Y. N. Patt. 2006. Utility-based cache partitioning: A low-overhead, high-performance, runtime mechanism to partition shared caches. In Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture. DOI:http://dx.doi.org/10.1109/MICRO.2006.49 Google ScholarDigital Library
Parthasarathy Ranganathan, Sarita Adve, and Norman P. Jouppi. 2000. Reconfigurable caches and their application to media processing. In Proceedings of the 27th Annual International Symposium on Computer Architecture (ISCA’00). ACM, New York, NY. DOI:http://dx.doi.org/10.1145/339647.339685 Google ScholarDigital Library
Charles Reiss, Alexey Tumanov, Gregory R. Ganger, Randy H. Katz, and Michael A. Kozuch. 2012. Heterogeneity and dynamicity of clouds at scale: Google trace analysis. In Proceedings of the ACM Symposium on Cloud Computing (SoCC). ACM, New York, NY. Google ScholarDigital Library
Chuck Rosenberg. 2013. Improving Photo Search: A Step Across the Semantic Gap. Retrieved from http://googleresearch.blogspot.com/2013/06/improving-photo-search-step-across.html.Google Scholar
Daniel Sanchez and Christos Kozyrakis. 2011. Vantage: Scalable and efficient fine-grain cache partitioning. SIGARCH Comput. Archit. News 39, 3 (June 2011). DOI:http://dx.doi.org/10.1145/2024723.2000073 Google ScholarDigital Library
Yoon Jae Seong, Eyec Hyun Nam, Jin Hyuk Yoon, Hongseok Kim, Jin yong Choi, Sookwan Lee, Young Hyun Bae, Jaejin Lee, Yookun Cho, and Sang Lyul Min. 2010. Hydra: A block-mapped parallel flash memory solid-state disk architecture. IEEE Trans. Comput. 59, 7 (July 2010). DOI:http://dx.doi.org/10.1109/TC.2010.63 Google ScholarDigital Library
Akbar Sharifi, Shekhar Srikantaiah, Asit K. Mishra, Mahmut Kandemir, and Chita R. Das. 2011. METE: Meeting end-to-end qos in multicores through system-wide resource management. In Proceedings of the ACM SIGMETRICS Joint International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS’11). ACM, New York, NY. DOI:http://dx.doi.org/10.1145/1993744.1993747 Google ScholarDigital Library
Shekhar Srikantaiah, Mahmut Kandemir, and Qian Wang. 2009. SHARP control: Controlled shared cache management in chip multiprocessors. In Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 42). ACM, New York, NY. DOI:http://dx.doi.org/10.1145/1669112.1669177 Google ScholarDigital Library
Shingo Tanaka and Christos Kozyrakis. 2014. High performance hardware-accelerated flash key-value store. In Proceedings of the 2014 Non-volatile Memories Workshop (NVMW).Google Scholar
Lingjia Tang, J. Mars, N. Vachharajani, R. Hundt, and M. L. Soffa. 2011. The impact of memory subsystem resource sharing on datacenter applications. In Proceedings of the 2011 38th Annual International Symposium on Computer Architecture (ISCA). Google ScholarDigital Library
Eno Thereska, Hitesh Ballani, Greg O’Shea, Thomas Karagiannis, Antony Rowstron, Tom Talpey, Richard Black, and Timothy Zhu. 2013. IOFlow: A software-defined storage architecture. In Proceedings of the 24th ACM Symposium on Operating Systems Principles (SOSP’13). ACM, New York, NY, 182--196. DOI:http://dx.doi.org/10.1145/2517349.2522723 Google ScholarDigital Library
Arunchandar Vasan, Anand Sivasubramaniam, Vikrant Shimpi, T. Sivabalan, and Rajesh Subbiah. 2010. Worth their watts? An empirical study of datacenter servers. In Proceedings of the International Symposium on High-Performance Computer Architecture.Google ScholarCross Ref
Nedeljko Vasić, Dejan Novaković, Svetozar Miučin, Dejan Kostić, and Ricardo Bianchini. 2012. DejaVu: Accelerating resource allocation in virtualized environments. In Proceedings of the 17th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). London, UK. Google ScholarDigital Library
Ben Verghese, Anoop Gupta, and Mendel Rosenblum. 1998. Performance isolation: Sharing and isolation in shared-memory multiprocessors. In Proceedings of the 8th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS VIII). ACM, New York, NY, 181--192. DOI:http://dx.doi.org/10.1145/291069.291044 Google ScholarDigital Library
Christo Wilson, Hitesh Ballani, Thomas Karagiannis, and Ant Rowtron. 2011. Better never than late: Meeting deadlines in datacenter networks. In Proceedings of the ACM SIGCOMM 2011 Conference (SIGCOMM’11). ACM, New York, NY. DOI:http://dx.doi.org/10.1145/2018436.2018443 Google ScholarDigital Library
Carole-Jean Wu and Margaret Martonosi. 2008. A comparison of capacity management schemes for shared CMP caches. In Proceedings of the 7th Workshop on Duplicating, Deconstructing, and Debunking, Vol. 15. Citeseer.Google Scholar
Yuejian Xie and Gabriel H. Loh. 2009. PIPP: Promotion/insertion pseudo-partitioning of multi-core shared caches. In Proceedings of the 36th Annual International Symposium on Computer Architecture (ISCA’09). ACM, New York, NY. DOI:http://dx.doi.org/10.1145/1555754.1555778 Google ScholarDigital Library
Hailong Yang, Alex Breslow, Jason Mars, and Lingjia Tang. 2013. Bubble-flux: Precise online QoS management for increased utilization in warehouse scale computers. In Proceedings of the 40th Annual International Symposium on Computer Architecture (ISCA’13). Google ScholarDigital Library
A. Yasin. 2014. A top-down method for performance analysis and counters architecture. In Proceedings of the 2014 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). 35--44.Google ScholarCross Ref
Xiao Zhang, Eric Tune, Robert Hagmann, Rohit Jnagal, Vrigo Gokhale, and John Wilkes. 2013. CPI2: CPU performance isolation for shared compute clusters. In Proceedings of the 8th ACM European Conference on Computer Systems (EuroSys). Prague, Czech Republic. Google ScholarDigital Library
Yunqi Zhang, Michael A. Laurenzano, Jason Mars, and Lingjia Tang. 2014. SMiTe: Precise QoS prediction on real-system SMT processors to improve utilization in warehouse scale computers. In Proceedings of the International Symposium on Microarchitecture (MICRO). Google ScholarDigital Library

Index Terms

Improving Resource Efficiency at Scale with Heracles
1. Computer systems organization
  1. Architectures
    1. Distributed architectures
      1. Cloud computing
2. Software and its engineering
  1. Software organization and properties
    1. Contextual software domains
      1. Operating systems
        Process management
        Scheduling

Recommendations

Heracles: improving resource efficiency at scale
ISCA '15: Proceedings of the 42nd Annual International Symposium on Computer Architecture

User-facing, latency-sensitive services, such as websearch, underutilize their computing resources during daily periods of low traffic. Reusing those resources for other tasks is rarely done in production services since the contention for shared ...
Read More
The Hipster Approach for Improving Cloud System Efficiency

In 2013, U.S. data centers accounted for 2.2% of the country’s total electricity consumption, a figure that is projected to increase rapidly over the next decade. Many important data center workloads in cloud computing are interactive, and they demand ...
Read More
Increasing Utilization in Modern Warehouse-Scale Computers Using Bubble-Up

Precisely predicting performance degradation due to colocating multiple executing applications on a single machine is critical for improving utilization in modern warehouse-scale computers (WSCs). Bubble-Up is the first mechanism for such precise ...
Read More

Reviews

Reviewer: Bayard Kohlhepp

Most of the paper's authors are connected to Google, and their work centers on Google workload performance improvement. They've developed runtime controller software, Heracles, that uses real-time feedback and static modeling rules to adjust resource allocation within servers in order to meet service-level objectives (SLOs). The paper's closing section demonstrates that Heracles improved performance in test systems. It's great that they've made Google faster, but what use is this Google performance tool to the rest of us Unless and until they make Heracles freely downloadable (and we have server applications that can make use of it), the tool itself is of no general interest. The value of this paper, though, lies not in the end product, but in the journey that led to the end product. The first nine or ten pages describe the authors' analysis of resource contention, specifically the interplay between latency critical (LC) tasks and noncritical, best-effort (BE) tasks. All applications, from Internet of Things (IoT) to the cloud, on smartphones and in data center servers, face the problem of guaranteeing quick response from critical services despite the unpredictable activity of background tasks. At present, we solve the problem by over allocating resources. We throw money at the problem, paying for peak usage scenarios, while day in and day out we tolerate idle central processing units (CPUs) and underutilized storage. The analysis that led to Heracles, summarized in this paper, brings us a step closer to building efficient systems. The authors have created a template we can all use to analyze resource contention. They also identify specific tools and techniques used to address contention issues, quantify performance improvements achieved by using those tools, and survey numerous research contributors for further investigation. The rest of us will probably never use Heracles, but we can all use this advice to improve our own little corner of the universe. Online Computing Reviews Service

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Computer Systems Volume 34, Issue 2
May 2016
96 pages
ISSN:0734-2071
EISSN:1557-7333
DOI:10.1145/2912575
Editor:
Todd C. Mowry
Carnegie Mellon University, Pittsburgh, PA
Issue’s Table of Contents
Copyright © 2016 Owner/Author
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 5 May 2016
- Accepted: 1 January 2016
- Received: 1 October 2015
Published in tocs Volume 34, Issue 2

Check for updates
Author Tags
Datacenter
QoS
interference
latency-critical applications
performance isolation
resource efficiency
scheduling
warehouse-scale computer
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 46
  Total Citations
  View Citations
- 2,800
  Total Downloads
- Downloads (Last 12 months)138
- Downloads (Last 6 weeks)26
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Improving Resource Efficiency at Scale with Heracles

ACM Transactions on Computer Systems

Abstract

References

Cited By

Index Terms

Recommendations

Heracles: improving resource efficiency at scale

The Hipster Approach for Improving Cloud System Efficiency

Increasing Utilization in Modern Warehouse-Scale Computers Using Bubble-Up

Reviews

Access critical reviews of Computing literature here

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Improving Resource Efficiency at Scale with Heracles

ACM Transactions on Computer Systems

Abstract

References

Cited By

Index Terms

Recommendations

Heracles: improving resource efficiency at scale

The Hipster Approach for Improving Cloud System Efficiency

Increasing Utilization in Modern Warehouse-Scale Computers Using Bubble-Up

Reviews

Access critical reviews of Computing literature here

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media