Battery charge scheduling in long-life autonomous mobile robots via multi-objective decision making under uncertainty☆
Introduction
Autonomous robots are being deployed for increasingly long durations in labs, homes, and offices, for tasks from vacuuming to security [1], [2], [3]. These robots perform tasks throughout the day, many of which must be performed within time-windows. Since the robots are typically battery powered, they have limited operation time, and thus they must recharge regularly. For many robots, this can be achieved autonomously by having the robot dock at a charging station. In spite of the importance of battery management for long-lived mobile robots, limited attention has been given to the problem of deciding when to charge. Current methods typically send the robot to charge when its battery is low, or designate a period of the day within which it always charges. In either case, the generated behaviour can be inflexible and inefficient, not taking into account task-related knowledge, such as the value associated to tasks currently available to the robot, and predictions on future tasks. In this paper, we propose an approach for scheduling battery charging that takes into account this task-related knowledge.
This work is inspired by our long-term deployments of mobile service robots in office environments [1]. During operation the robot would execute tasks requested both by human users, and by curiosity-driven components of the system, which added observation tasks to the robot’s schedule in order to improve its internal models. Every task was associated with a time-window within which execution should happen, and a reward value representing the benefit of completing it. For example, being at reception at 9 am to greet an important visitor was considered more important than being at the printer at 2pm to observe human activities. Furthermore, while some of these tasks were scheduled beforehand, many were demanded on-the-fly by users. Under these circumstances, the future schedule of the robot is not deterministic, giving rise to a need for models that encode the uncertainty associated with future tasks. We thus formulate the battery charge scheduling problem as a finite-horizon Markov decision process (MDP) that encodes the dynamics of the battery and available tasks, where the rewards available for executing tasks evolve according to a time-dependent transition function. These models are learnt from data obtained from robot experience, and are continuously maintained and updated. This provides greater accuracy as the robot learns the dynamics of a specific environment.
In order to increase the lifetime of modern batteries, manufacturers advise reducing the time spent with low levels of charge. Furthermore, if the battery discharges totally, then human intervention is required to manually move the robot to the charging station. Thus, we specify a multi-objective problem, where the goal is to find policies that trade off the amount of time the robot spends under a user-defined battery level threshold, against the expected cumulative reward from task execution. The MDP model is formulated and solved using the probabilistic model checker PRISM [4], utilising its implementation of Pareto front approximation for multi-objective problems [5]. In order to achieve online execution, and allow the system to react to the arrival of new tasks, we propose a receding horizon control (RHC) execution of the Pareto-optimal policies.
The battery scheduler proposed in this work decides whether the robot should charge or perform tasks, assuming that the robot is able to successfully complete all tasks (and thus gather all rewards) available for a specific time-window. Scheduling and execution of tasks is handled by a separate system [6], [7]. This separation of concerns allows for an approach that can reason about long windows of time when deciding whether to charge. We evaluate the scheduling system using real life deployment schedules with a scheduling time-window of up to 24 h and show that the system outperforms different rule-based systems.
In summary, the main contribution of this paper is the use of techniques from multi-objective decision making under uncertainty to develop a battery charge scheduler that adapts the schedule to new tasks as they arrive; charges the robot during less busy periods; and anticipates future busy periods, ensuring enough charge is built up to cope with them.
This paper is an extended version of [8]. Here, we provide a more thorough coverage of related work, improve the structure of the presentation of the framework, elaborate on our policy execution approach, and provide a broader empirical evaluation, covering more datasets and evaluating the scalability of the method.
Section snippets
Related work
Research in battery management usually focuses on conservation of energy or effective use of available power. Of particular interest in this context is dynamic power management (DPM), where one dynamically adjusts the battery consumption requirements depending on the needs of the tasks being executed. An early example of this in the context of robotics is [9], which proposes a DPM approach for a mobile robot using deterministic linear models of power consumption. Outside of robotics, DPM has
Markov models
We model the stochastic dynamics of the battery charge scheduling problem as a finite-horizon Markov decision process (MDP). We start by introducing deterministic time Markov chains (DTMCs), which we use to model the battery dynamics.
Definition 1 DTMC A DTMC is a tuple , where: is a finite set of states; and is a probabilistic transition function, where for all .
represents the probability of moving to state given that we were in state in the previous timestep.
Underlying stochastic processes
We model the stochastic dynamics of the two processes which influence whether to charge or execute tasks at a specific point in time: the battery level, and the predicted reward for future task execution. The values of these features will be part of the state representation of the MDP representing the battery charge scheduling problem. We propose discretised models for these processes, thereby providing a balance between accuracy and the resulting model size.
We assume a fixed-length timestep
MDP model of battery charge scheduling
We model the battery charge scheduling problem as a finite-horizon MDP.
Definition 8 Charge Scheduling MDP A charge scheduling MDP is a tuple , where: , where a state is such that represents the current battery level, represents the reward cluster closest to the reward observed, if and only if the robot is charging, and is the current timestep. , i.e., at each timestep the robot has three available actions: the gather reward action
Experimental study
We now present an empirical evaluation of our approach. Through the following experiments, we demonstrate the scalability and robustness of the battery scheduler. The implementation of the battery charge scheduler, along with the datasets used for the experiments, can be found in https://github.com/milanmt/Battery-Scheduler.
Conclusion
We presented an approach for scheduling battery charging for a mobile robot. Our approach exploits a model which predicts the distribution of rewards available from task execution in the future. It uses this model to build policies that ensure highly valued tasks can be executed whilst keeping the battery level above a safety threshold. Alongside the already discussed avenues for improvement, future work will focus on developing an approach to integrate the part of the schedule already known at
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Milan Tomy received her M.Sc. in Robotics in 2017, from the University of Birmingham. Her research interests lie in planning and decision making under uncertainty. She is motivated by the application of this research in simplifying rather than eliminating human effort. She has been working as a machine learning engineer aiming to improve the operations of manufacturing industries.
References (29)
- et al.
The STRANDS project: Long-term autonomy in everyday environments
IEEE Robot. Autom. Mag.
(2017) - M.M. Veloso, J. Biswas, B. Coltin, S. Rosenthal, CoBots: Robust symbiotic autonomous mobile service robots, in:...
- et al.
Artificial intelligence for long-term robot autonomy: A survey
IEEE Robot. Autom. Lett.
(2018) - et al.
PRISM 4.0: Verification of probabilistic real-time systems
- et al.
Pareto curves for probabilistic model checking
- et al.
Probabilistic planning with formal performance guarantees for mobile service robots
Int. J. Robot. Res.
(2019) - et al.
An integrated control framework for long-term autonomy in mobile service robots
- et al.
Battery charge scheduling in long-life autonomous mobile robots
- et al.
A case study of mobile robot’s energy consumption and conservation techniques
- et al.
Battery-aware power management based on Markovian decision processes
IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst.
(2006)
Optimal and robust controller synthesis: Using energy timed automata with uncertainty
Efficient runtime quantitative verification using caching, lookahead, and nearly-optimal reconfiguration
Resource-performance tradeoff analysis for mobile robots
IEEE Robot. Autom. Lett.
Towards integrating formal verification of autonomous robots with battery prognostics and health management
Cited by (19)
Co-optimizing for task performance and energy efficiency in evolvable robots
2022, Engineering Applications of Artificial IntelligenceCitation Excerpt :Ignoring the practical energy limits increases the reality gap (Jakobi et al., 1995; Mouret and Chatzilygeroudis, 2017) and is a limiting factor in the practical application of the evolved systems. Some robotics systems tackle the battery problem by scheduling charging times (Tomy et al., 2020; Floreano and Mondada, 1996), or by improving algorithms for green scheduling (Cota et al., 2019). However, these solutions do not observe possible controller or morphological changes resulting from battery-conscious robots.
Fast-charging station for electric vehicles, challenges and issues: A comprehensive review
2022, Journal of Energy StorageCitation Excerpt :A multi-objective optimization model was developed to minimize operating costs, transmission losses and carbon emissions of several micro-grid systems. In [146], the issue of battery charge scheduling was discussed as a sequential multi-objective decision problem in a Markov decision process model of predictable work rewards and battery dynamics. Fig. 12 shows a general classification of intelligent algorithms [147–152].
Special Issue on the 9th European Conference on Mobile Robots (ECMR 2019)
2022, Robotics and Autonomous SystemsPerformance Guarantee for Autonomous Robotic Missions using Resource Management: The PANORAMA Approach
2024, Journal of Intelligent and Robotic Systems: Theory and ApplicationsMetal-air batteries for powering robots
2023, Journal of Materials Chemistry A
Milan Tomy received her M.Sc. in Robotics in 2017, from the University of Birmingham. Her research interests lie in planning and decision making under uncertainty. She is motivated by the application of this research in simplifying rather than eliminating human effort. She has been working as a machine learning engineer aiming to improve the operations of manufacturing industries.
Bruno Lacerda received his Ph.D. in Electrical and Computing Engineering from the Instituto Superior Técnico, University of Lisbon, Portugal, in 2013. Between 2013 and 2017, he was a Research Fellow at the School of Computer Science, University of Birmingham, UK. Currently, he is a Senior Researcher at the Oxford Robotics Institute, University of Oxford, UK. His research interests lie in the use of formal approaches to specify and synthesise high-level robot controllers. To achieve this goal, he his particularly interested in using temporal logics, Petri nets, supervisory control theory and planning under uncertainty.
Nick Hawes received his Ph.D. in Artificial Intelligence from the University of Birmingham in 2004. He is currently an Associate Professor in the Oxford Robotics Institute, part of the Department of Engineering Science at the University of Oxford, and a Fellow of Pembroke College. He leads the GOALS research group which performs research on problems in mission planning and decision making for autonomous system, particularly goal-oriented, long-lived robots acting in uncertain environments. He is an Associate Editor for the Journal of AI Research, and a Group Leader for AI and Robotics at the UK’s Turing Institute.
Jeremy L. Wyatt received his B.A. in Theology from the University of Bristol, U.K., a Masters in Knowledge-Based Systems from the University of Sussex, U.K., and his Ph.D. in Artificial Intelligence from the University of Edinburgh, U.K., in 1997. He is an Honorary Professor of Robotics and Artificial Intelligence at the University of Birmingham, U.K. He has published more than 110 refereed articles, led two international research projects on robot planning and learning (CogX) and robot manipulation (PaCMan), and edited three books. His research interests include machine learning, planning, architectures, robot vision, mobile robotics, and robot manipulation. Professor Wyatt was a recipient of two best paper prises. He was a Leverhulme Fellow from 2006 to 2008.
- ☆
This work was supported by UK Research and Innovation and EPSRC, UK through the Robotics and Artificial Intelligence for Nuclear (RAIN) research hub [EP/R026084/1].