Automatic identification and hardware implementation of a resource-constrained power model for embedded systems

https://doi.org/10.1016/j.suscom.2020.100467Get rights and content

Highlights

  • This is the first work proposing a methodology able to provide a trade off between the accuracy of the power estimates and the area occupation of the power monitors

  • The flow is completely automated, from the power model identification to the power monitoring infrastructure implementation.

  • Our methodology is able to identify and implement a power monitor that shows a good accuracy (the same as the best state of the art solutions), while saving a 37.3% of area occupation on average.

  • For the evaluation of this work we analyzed both HLS generated hardware accelerators and a hand-made designed RISC-V processor, showing that our methodology is effective against any type of target hardware, even in the case the Verilog code is not human readable (e.g., HLS output).

Abstract

In modern embedded systems, the use of hardware-level online power monitors is crucial to support the run-time power optimizations required to meet the ever increasing demand for energy efficiency. To be effective and to deal with the time-to-market pressure, the presence of such requirements must be considered even during the design of the power monitoring infrastructure. This paper presents a power model identification and implementation strategy with two main advantages over the state-of-the-art. First, our solution trades the accuracy of the power model with the amount of resources allocated to the power monitoring infrastructure. Second, the use of an automatic power model instrumentation strategy ensures a timely implementation of the power monitor regardless the complexity of the target computing platforms. Our methodology has been validated against 8 accelerators generated through a High-Level-Synthesis flow and by considering a more complex RISC-V embedded computing platform. Depending on the imposed user-defined constraints and with respect to the unconstrained power monitoring state-of-the-art solutions, our methodology shows a resource saving between 37.3% and 81% while the maximum average accuracy loss stays within 5%, i.e., using the aggressive 20us temporal resolution. However, by varying the temporal resolution closer to the value proposed in the state of the art, i.e., in the range of hundreds of microseconds, the average accuracy loss of our power monitors is lower than 1% with almost the same overheads. In addition, our solution demonstrated the possibility of delivering a resource constrained power monitor employing a 20us temporal resolution, i.e., far higher the one used by current state-of-the-art solutions.

Introduction

Considering the large variety of constraints imposed by current applications, the use of specialized hardware accelerators [15], [14] and optimized multi-cores represents the de-facto solution to deliver efficient computing platforms coping with the ever-increasing time-to-market.

However, the complexity of such computing platforms and the variability of the requirements due to the executed applications, motivate the use of run-time power-aware optimization techniques to trade the computational power and the energy consumption. In this scenario, online power monitors are the de-facto solution to estimate the power state of the system at run-time and, thus, to effectively support any optimization technique. Online power monitors deliver a periodic power estimate by leveraging the relationship between the power consumption and the internal switching activity of the target computing platform. The realization of a power monitor encompasses two steps. First, the power model identification stage finds out the mathematical relationship between the power consumption and the switching activity of the computing platform. Second, a power monitor is designed by implementing the mathematical power model into the target platform. The possibility of exploiting such relationship at different abstraction levels motivates the evaluation of both software- and hardware-implemented power monitors, where each of them offers a different trade-off between the accuracy of the power estimates and the implementation complexity. Software power monitors are applications that provides online power monitoring capabilities when the RTL description of the platform is not accessible, at the cost of a non-negligible performance overhead as well as low accuracy and limited temporal resolution for the power estimates. Hardware power monitors rely on dedicated hardware to deliver highly accurate power estimates at high temporal resolution and without performance overhead at the cost of changing the RTL description of the computing platform.

It is important to note that regardless of their software- or hardware-level implementations, the state-of-the-art power monitors are meant to solely minimize the prediction error of the power estimates without accounting for the imposed overheads in terms of area, power, timing and performance.

Contributions – Based on the fact that hardware solutions represent the de-facto choice to provide online power monitoring capabilities to modern embedded systems [6], [5], [11], this paper presents a novel methodology to design efficient power monitors for generic computing platforms. Such monitors minimize the prediction error within a configurable resource budget, thus providing two major contributions to the state-of-the-art:

  • Resource-constrained power monitor– We designed a set of highly efficient and fully characterized hardware building blocks to implement the power monitor. To this end, our methodology allows the user to accurately configure the resource budget devoted to the power model. Experimental results on 8 High-Level-Synthesis (HLS) generated designs and a RISC-V System-on-Chip, demonstrated that our methodology always delivers a power model within the specified resource budget with an average accuracy error lower than 5%, thus aligned with state-of-the-art solutions.

  • Automatic power monitor implementation– To optimize the time-to-market, our methodology allows to automatically augment the Register-Transfer-level (RTL) of a generic target computing platform with the hardware description of the power monitor. In particular, we can easily instrument HLS netlists for which the hand-made implementation has to face with the fact that such descriptions are not meant to be human readable. In addition, the proposed methodology allows to configure the temporal resolution of the power monitor between 20 us and 500 us without affecting the timing of the monitored design considered in our analysis.

The rest of the paper is organized in four parts. Section 2 presents the background and the state-of-the-art related to the online power monitoring systems. The proposed resource-constrained power monitor design flow is discussed in Section 3, while results considering the accuracy of the power estimates as well as area, power and timing overheads are presented in Section 4. Conclusions are drawn in Section 5.

Section snippets

Related works

Considering a generic computing platform, the design of the online power monitor requires (i) to identify the mathematical formulation of the power model, and (ii) to implement such model into the computing platform. A summary of the highlights and open issues of the state of the art for both aspects is reported below.

Power model identification – The state-of-the-art contains several solutions to monitor the power consumption of the platform at run-time, either at software- or hardware-level.

Methodology

Fig. 1 depicts the proposed toolchain to automatically generate a hardware-level resource-constrained power monitor for generic computing platforms. Starting from the hardware description of the computing platform (RTL-source), its corresponding set of design constraints (designConstr) and the user-defined constraints (usrDefConstr), the flow outputs an hardware description file containing the target design augmented with the power monitor (netlist power monitor). We note that the design

Results

This section reports the assessment of the proposed methodology focusing on the accuracy and the resource utilization of the hardware-level power monitoring infrastructure. The experimental setup is discussed in Section 4.1, while the accuracy of the power model estimates and the resource utilization results are detailed in Section 4.2. Section 4.3 will analyze the trend of the design metrics, by varying the temporal resolution of the power estimates.

Conclusions

This paper presented a methodology to automatically instrument a resource-constrained power monitor into generic hardware designs. Results have been validated considering both HLS-generated hardware accelerators as well as a RISC-V based SoC across a wide set of temporal resolutions ranging from 20 us to 500 us.

Depending on the imposed user-defined constraints and with respect to the unconstrained power monitoring state-of-the-art solutions, our methodology shows a resource saving between 37.3%

Conflicts of interest

None declared.

Declaration of Competing Interest

The authors report no declarations of interest.

Acknowledgments

This work was partially supported by the H2020 FET-HPC project “RECIPE” [1]. G. A. no. 801137.

References (15)

  • G. Agosta et al.

    The recipe approach to challenges in deeply heterogeneous high performance systems

    Microprocessors and Microsystems

    (2020)
  • W.L. Bircher et al.

    Complete system power estimation using processor performance events

    IEEE TC

    (2012)
  • J. Gustafsson et al.

    The malardalen wcet benchmarks: past, present and future

  • Y. Jianlei et al.

    Early stage real-time soc power estimation using rtl instrumentation

    The 20th ASPDAC

    (2015)
  • M. Najem et al.

    A design-time method for building cost-effective run-time power monitoring

    IEEE TCAD

    (2017)
  • D.J. Pagliari et al.

    All-digital embedded meters for on-line power estimation

    DATE

    (2018)
  • C. Pilato et al.

    Bambu: a modular framework for the high level synthesis of memory-intensive applications

    23rd Int. Conf. on Field Programmable Logic and Applications

    (2013)
There are more references available in the full text version of this article.

Cited by (10)

  • Cost-effective fixed-point hardware support for RISC-V embedded systems

    2022, Journal of Systems Architecture
    Citation Excerpt :

    Modern embedded systems, especially those at the edge, are no longer only smart sensors but also general-purpose computing platforms in charge of efficiently performing a large variety of computationally intensive tasks. Apart from using system-wide energy-performance optimization policies [1,2] employing run-time power monitors either in hardware [3] or software [4], a vast amount of research targets the optimization of the floating-point computations executed in such tasks. Approximate computing techniques operate at compile-time to leverage the error tolerance of several emerging applications by trading the accuracy of the computed data with their energy consumption [5].

  • Real-time estimation method of processor power via power counters

    2024, Gaojishu Tongxin/Chinese High Technology Letters
  • RISC-V Processor Technologies for Aerospace Applications in the ISOLDE Project

    2023, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
  • PELSI: Power-Efficient Layer-Switched Inference

    2023, Proceedings - 2023 IEEE 29th International Conference on Embedded and Real-Time Computing Systems and Applications, RTCSA 2023
View all citing articles on Scopus
View full text