Elsevier

Parallel Computing

Volume 99, November 2020, 102686
Parallel Computing

Dynamic power management for value-oriented schedulers in power-constrained HPC system

https://doi.org/10.1016/j.parco.2020.102686Get rights and content

Abstract

High performance computing (HPC) systems are confronting the challenge of improving their productivity under a system-wide power constraint in the exascale era. To measure the productivity of an HPC job, researchers have proposed to assign a monotonically decreasing time-dependent value function, called job-value, to that job. These job-value functions are used by the value-based scheduling algorithms to maximize the system productivity where system productivity is the accumulation of job-value for the completed jobs. In this study, we first show that the relative performance of the competing state-of-the-art static power allocation strategies interchange based on the level of the power constraint when applied to the value-based algorithms. We then investigate the limitations of these static strategies by relating the job completion rate to the resource utilization, and expose that there is non-negligible amount of unused resources for the scheduler to utilize. Even though the system is oversubscribed, these unused resources are insufficient to schedule new high-value jobs. Based on this observation, we propose a novel dynamic power management strategy for the value-based algorithms. Our dynamic allocation policy maximizes the system productivity, resource utilization, and job completion rate by utilizing application power-performance models to reallocate power from running jobs to newly arrived jobs. We simulate a large-scale system that uses job arrival traces from a real HPC system. We demonstrate that the dynamic-variant of each value-based algorithm earns up to 16% higher productivity and completes 13% more jobs compared to its static variants when power becomes a highly constrained resource in the system.

Introduction

Improving the productivity of a high performance computing (HPC) system has been a non-trivial challenge since the inception of the petascale computing. As the modern HPC system is heading towards the exascale era, the design complexity of the solutions for the HPC productivity challenge is compounded with the additional constraint on the system-wide power consumption [1]. Traditionally, the performance of an HPC system is defined in terms of floating-point operations-per-second (FLOPS). However, numerous studies have shown that the flops is an inadequate metric to define the HPC productivity [2], [3], [4], [5], [6], [7]. As an alternate, these studies define HPC productivity in terms of the importance or value of the outputs generated on the system at the time of completion for each application. To measure the importance of an application output, a time-dependent value (utility) function is assigned to each job [2], [3], [4]. Several researchers have adapted this time-dependent value function definition and proposed value-based resource management heuristics for an oversubscribed system to maximize HPC productivity [6], [7], [8], [9], [10], [11], [12], [13]. However, none of these approaches consider the system-wide power as a constrained resource.

The U.S. Department of Energy has mandated to operate a future exascale system under a strict power budget of 20MW - 30MW to support efficient electricity generation and distribution, and to keep the operational cost of an exascale computing system manageable [1]. This creates a necessity to design new value-based resource management algorithms that address the productivity challenge for a power-constrained system. In our earlier work, we explored various static power allocation strategies for two value-based algorithms (value-per-time (VPT) [14], [15] and value-per-energy (VPE) [12]). The static power allocation refers to a fixed allocation of power-budget at the start of a job execution on the HPC. To best of our knowledge, these are the only studies that address the productivity challenge in a power-constrained environment. However, the static power allocation strategies suffer from the resource under-utilization because, after the completion of a job, the freed resources are left unused in the absence of a new job. Furthermore, the static policies lead to inefficient utilization of the resources as once a low value-job is scheduled in the system, a newly arrived high-value job may starve for resources if the system does not have sufficient idle nodes or power to execute the new job. To overcome these disadvantages of the static allocation strategies, in this work, we propose a novel dynamic power reallocation strategy for the value-based algorithms. The dynamic term corresponds to the alterations in the job’s power-budget during its execution on the HPC. The contributions of this study are:

  • We propose a novel strategy to dynamically modify the power budget of the running jobs to maximize the productivity of the value-based algorithms.

  • We propose modifications in the traditional static VPE and VPT algorithms by adapting our dynamic strategy and demonstrate improvement in the HPC productivity compared to their static variants.

  • We propose a low-overhead offline-modeling approach to create power-execution time models for the HPC applications. These models are necessary to implement our power-aware value-based algorithms.

  • We expose the advantages and disadvantages of the static power-aware VPE and VPT algorithms under different system-wide power constraints.

  • We demonstrate the superiority of our dynamic strategy in improving productivity, resource utilization, and job completion rate against the state-of-the-art static algorithms based on real HPC workload traces.

The preliminary version of this work has appeared in the “Parallel and Distributed Computing, Applications, and Technologies (PDCAT 2019)” conference [16]. In the preliminary version, we introduced the concept of dynamic power allocation strategy for the VPT algorithm and demonstrated its strength in improving system-productivity and the resource utilization compared to the static power-aware VPT algorithms [15]. Our evaluation was limited to synthetic workloads. In this paper, we expand on our preliminary version with the following contributions:

  • We introduce the first dynamic implementation of the VPE algorithm based on our dynamic strategy.

  • We expand our analysis to include dynamic-VPE with respect to dynamic-VPT and state-of-the-art static algorithms. We show that not only the VPT algorithm but also the VPE algorithm benefits from our dynamic strategy as power becomes a limited resource.

  • We mathematically formalize the decisions of all the static and dynamic algorithms used in this study.

  • We enhance our simulation environment to include real HPC workloads and validate the findings of our preliminary work based on synthetic workloads.

  • We perform an in-depth analysis of our algorithms and their impact on different performance metrics such as productivity, resource utilization, scheduling overhead, and job completion rate.

  • We evaluate the impact of different power-allocation strategies on the individual job execution time and energy consumption under different system-wide power constraints.

The rest of this paper is organized as follows. We provide an overview of our job-value function and present literature review in Section 2. We present the details of our target HPC environment and mathematically describe the objective function in Section 3. Next, we present the steps for creating power-execution time models for the target environment in Section 4. We describe and formulate the static power-aware value-based algorithms and introduce our novel dynamic power allocation strategy for the VPT and VPE algorithms in Section 5. We present our experimental setup and performance evaluation in Section 6. Finally, we conclude our study and discuss planned future work in Section 7.

Section snippets

Background and related work

A large number of users are migrating to the HPC systems as the computation requirements for the applications are increasing with the explosion in the data. In a traditional HPC environment, a user submits an application as a job with a fixed priority and then waits for the completion of the job. In this traditional approach of job submission, it is common for a user-submitted job to wait in the resource allocation queue for a duration that is significantly longer than its actual execution time

HPC environment model

In this work, we model an HPC system composed of homogeneous nodes, and each node is composed of one or more computing units (multi-core CPU), memory, secondary storage, and network interface card. Each compute unit has instrumentation to monitor and control its power consumption. The power consumption of the remaining components is excluded from the system-wide power budget. Based on the system-wide power budget determined by the system administrator, the resource manager relies on the

Power-execution time model

Predicting the execution time of an application under given resource constraints is a critical step in value-based algorithms to make informed scheduling decisions. In the literature, such predictions are made either using historical data [6], [12], [18] or application performance models [14], [23], [33], [34]. In this work, we choose to create application-specific performance models for two reasons. First, the workload analysis by Antypas et al. on the Hopper production system at the National

Overview

In our previous work [14], [15], [39], we used the value-per-time (VPT) [6] algorithm to explore various power allocation strategies. We refer to this algorithm as the baseline value-per-time (or baseline-VPT). In the baseline-VPT, the scheduling decisions for the waiting jobs are made at the occurrence of a mapping event. In value-based algorithms, a mapping event corresponds to the instance when scheduling decisions are made on the waiting or newly arrived jobs. A mapping event triggers the

Overview

We utilize an in-house simulation environment to simulate an HPC composed of 2048 nodes and conduct our evaluations based on realistic workload trace of jobs generated on the BlueGene/L system [41], where we simulate job arrivals in the system. Each node in the system contains two compute units (Intel Xeon E5-2695 v2). We limit the minimum (p_min) and maximum (p_max) power consumption for each compute unit to 60 and 115 W, respectively, as per the specification of the selected CPU. We create 30

Conclusion and future work

In this study, we introduce a novel dynamic power allocation strategy that successfully rearranges the distribution of the system-wide power among the jobs to improve the productivity, resource utilization, and job completion rate in a power-constrained HPC system. By using the real HPC workloads, we successfully demonstrate that the algorithms used for the dynamic strategies are consistently superior than their static variants under different power constraints.

In our simulations, we represent

CRediT authorship contribution statement

Nirmal Kumbhare: Conceptualization, Methodology, Software, Validation, Investigation, Visualization, Writing - original draft. Ali Akoglu: Conceptualization, Investigation, Validation, Writing - original draft, Supervision. Aniruddha Marathe: Formal analysis, Writing - review & editing. Salim Hariri: Writing - review & editing. Ghaleb Abdulla: Writing - review & editing.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References (41)

  • C. Tunc et al.

    Value of service based resource management for large-scale computing systems

    Clust. Comput.

    (2017)
  • X. Xu, W. Dou, X. Zhang, J. Chen, EnReal: an energy-aware resource allocation method for scientific workflow executions...
  • T. Patki et al.

    Practical resource management in power-constrained, high performance computing

    24th Int. Symposium on High-Performance Parallel and Distributed Computing (HPDC)

    (2015)
  • R. Lucas, J. Ang, K. Bergman, S. Borkar, W. Carlson, L. Carrington, G. Chiu, R. Colwell, W. Dally, J. Dongarra, Top ten...
  • M. Snir et al.

    A framework for measuring supercomputer productivity

    Int. J. High Perform. Comput. Appl.

    (2004)
  • S. Faulk et al.

    Measuring high performance computing productivity

    Int. J. High Perform. Comput. Appl.

    (2004)
  • J. Kepner

    High performance computing productivity model synthesis

    Int. J. High Perform. Comput. Appl.

    (2004)
  • T. Sterling

    Productivity metrics and models for high performance computing

    Int. J. High Perform. Comput. Appl.

    (2004)
  • B. Khemka et al.

    Utility functions and resource management in an oversubscribed heterogeneous computing environment

    IEEE Trans. Comput.

    (2015)
  • B. Ravindran et al.

    On recent advances in time/utility function real-time scheduling and resource management

    8th IEEE Int. Symposium on Object-Oriented Real-Time Distributed Computing (ISORC)

    (2005)
  • E.D. Jensen et al.

    A time-driven scheduling model for real-time operating systems

    6th IEEE Real-Time Systems Symposium (RTSS)

    (1985)
  • K. Chen et al.

    A scheduling algorithm for tasks described by time value function

    Real-Time Syst.

    (1996)
  • M. Kargahi et al.

    Performance optimization based on analytical modeling in a real-time system with constrained time/utility functions

    IEEE Trans. Comput.

    (2011)
  • C.B. Lee et al.

    Precise and realistic utility functions for user-centric performance analysis of schedulers

    16th Int. Symposium on High Performance Distributed Computing (HPDC)

    (2007)
  • D. Machovec et al.

    Utility-based resource management in an oversubscribed energy-constrained heterogeneous environment executing parallel applications

    Parallel Comput.

    (2019)
  • N. Kumbhare et al.

    A value-oriented job scheduling approach for power-constrained and oversubscribed HPC systems

    IEEE Trans. Parallel Distrib. Syst.

    (2020)
  • N. Kumbhare et al.

    Value based scheduling for oversubscribed power-constrained homogeneous HPC systems

    International Conference on Cloud and Autonomic Computing (ICCAC)

    (2017)
  • N. Kumbhare et al.

    Adaptive power reallocation for value-oriented schedulers in power-constrained HPC

    Parallel and Distributed Computing, Applications and Technologies (PDCAT)

    (2019)
  • N. Wolter et al.

    What’s working in HPC: Investigating HPC user behavior and productivity

    (2006)
  • D. Machovec et al.

    Value-based resource management in high-performance computing systems

    7th Workshop on Scientific Cloud Computing

    (2016)
  • Cited by (0)

    This work is partly supported by National Science Foundation (NSF) research projects NSF CNS-1624668. A part of this work is also performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344 (LLNL-JRNL-780060).

    View full text