Learning control for transmission and navigation with a mobile robot under unknown communication rates

doi:10.1016/j.conengprac.2020.104460

Control Engineering Practice

Volume 100, July 2020, 104460

https://doi.org/10.1016/j.conengprac.2020.104460 Get rights and content

Abstract

In tasks such as surveying or monitoring remote regions, an autonomous robot must move while transmitting data over a wireless network with unknown, position-dependent transmission rates. For such a robot, this paper considers the problem of transmitting a data buffer in minimum time, while possibly also navigating towards a goal position. Two approaches are proposed, each consisting of a machine-learning component that estimates the rate function from samples; and of an optimal-control component that moves the robot given the current rate function estimate. Simple obstacle avoidance is performed for the case without a goal position. In extensive simulations, these methods achieve competitive performance compared to known-rate and unknown-rate baselines. A real indoor experiment is provided in which a Parrot AR.Drone 2 successfully learns to transmit the buffer.

Introduction

This paper considers problems in which a mobile robot must transmit a data file (or equivalently, empty a data buffer) over a wireless network, with transmission rates that depend on the robot position and may be affected by noise. The objective is to move in such a way that the buffer is emptied in minimum time. Two versions will be considered: one in which there is no desired end position for the robot, which is called the transmission problem and abbreviated as PT; and a second version in which the trajectory must end at a given goal position, called navigation-and-transmission problem (PN). Such problems appear e.g. when a UAV autonomously collects survey data (video, photographic, etc.) which it must then deliver over an ad-hoc network, possibly while navigating to the next mission waypoint.

The key challenge is that the rate function is usually unknown to the robot, e.g. because depends on unknown propagation environment effects like path loss, shadowing, and fast fading. For both PT and PN, algorithms are proposed that (i) learn approximations of the rate function from values sampled so far along the trajectory and (ii) at each step, apply optimal control with the current approximation to choose robot actions. In PT, (i) is done with supervised, local linear regression (Moore & Atkeson, 1993) and (ii) with a local version of dynamic programming (Bertsekas, 2012), for arbitrarily-shaped rate functions that are assumed deterministic for the design. The PT method includes a simple obstacle avoidance procedure. In PN, component (i) uses active learning (Settles, 2009) and (ii) is a time-optimal control design from Lohéac, S. Varma, and Morarescu (2019). In the latter case, the rate function is learned via the signal-to-noise ratio, which is affected by random fluctuations and is taken to have a radially-symmetric expression with unknown parameters.

A common idea in both the PT and PN approaches is to focus the learning method on the unknown element of the problem: the rate function, while exploiting the known motion dynamics of the robot in order to achieve the required fast, single-trajectory learning. Both approaches are validated in extensive simulations for a robot with nonlinear, unicycle-like motion dynamics and where the rate functions are given by single or multiple antennas with a path-loss form. Moreover, a real indoor experiment is provided that illustrates PT with a single router and a Parrot AR.Drone 2.

The problem of optimizing sensing locations to reduce the uncertainty of an unknown map is well-known in machine learning, see e.g. Krause, Singh, and Guestrin (2008) for some fundamental results. This problem is often solved dynamically, to obtain so-called informative path planning, see e.g. Meera, Popovic, Millane, and Siegwart (2019) and Viseras et al. (2016); Popovic et al. (2018) provide a good overview of this field. Closer to the present work, Fink and Kumar (2010) and Penumarthi et al. (2017) learn a radio map with multiple mobile robots. Compared to these works, the methods used in this paper to learn the map are much simpler — basic regression methods, whereas Gaussian processes (Rasmussen & Williams, 2006) are often used in the references above. However, the novelty here is that the robot has control and communication objectives, whereas in most of the works above the objective of the robot is just to learn the map (keeping in mind that learning must typically be done from a small number of samples, which is related but not identical to the time-optimality objectives in the present paper).

On the other hand, a related thread of work in engineering, which does consider joint control and communication objectives, is motion planning under connectivity, communication rate, or quality-of-service constraints (Chatzipanagiotis et al., 2012, Fink et al., 2013, Ghaffarkhah and Mostofi, 2011, Rooker and Birk, 2007). Closer to the present work, when the model of the wireless communication rate is known, some recent works (Gangula et al., 2017, Licea et al., 2016, Lohéac et al., 2019, Ooi and Schindelhauer, 2009) have optimized the trajectory of the robot while ensuring that a buffer is transmitted along the way. The key novelty here with respect to these works is that the rate function is not known in advance by the robot. Instead, it must be learned from samples observed while the robot travels in the environment. In particular, the approach to solve PN incorporates the known-rate method of Lohéac et al. (2019) by applying it iteratively, at each step, for the current learned estimate of the rate function.

Some recent papers (Fink et al., 2012, Licea et al., 2017, Rizzo et al., 2019, Rizzo et al., 2013) explore control of robots with rate functions that are uncertain but still have a known model. For instance, Rizzo et al., 2019, Rizzo et al., 2013 carefully develop rate models for tunnels, and robots then adapt to the parameters of these models. In Yan and Mostofi (2013), the rate function is initially unknown, but the trajectory of the robot is fixed and only the velocity is optimized.

Compared to the preliminary conference paper (Buşoniu, Varma, Morărescu, & Lasaulce, 2019), fully novel contributions here are solving PN with unknown rates and the real-system results. Buşoniu et al. (2019) focused only on a simpler version of PT, with first-order robot dynamics, synthetic rate functions, and without obstacles. For the PT version studied here, the robot dynamics are extended to be of arbitrary order, the method is illustrated with realistic rate functions, and – importantly – basic obstacle avoidance functionality is introduced.

Next, Section 2 gives the problem definition, following which the paper treats separately PT (Section 3 for the algorithm and Section 4 for the results) and PN (Sections 5 Solution for the navigation and transmission problem, 6 Empirical study in the navigation and transmission problem, respectively). The real-system illustration is provided in Section 7, and Section 8 concludes the paper.

Section snippets

Problem definition

Consider a mobile robot with position $p \in P \subseteq R^{2}$ , additional motion-related states $y \in Y \subseteq R^{n_{y}}$ , $n_{y} \geq 0$ , and inputs $u \in U \subseteq R^{n_{u}}$ , $n_{u} \geq 1$ . The extra states $y$ may contain e.g. velocities, headings, or other variables needed to model the robot’s motion. Dimension $n_{y}$ may be zero, in which case the only state signals are the positions and the robot has first-order dynamics. In this case, variable $y$ can be omitted from the formalism below (and, by convention, $R^{0}$ is a singleton). A discrete-time setting is considered

Solution for the transmission problem

First, PT is reformulated as a deterministic optimal control problem. Define the union of all obstacles as $O$ , and the following stage reward function: $ρ (x_{k}, u_{k}, x_{k + 1}) = \{\begin{matrix} - o & if x_{k + 1} \in O \\ - 1 & if x_{k + 1} \notin O and b_{k} > 0 \\ 0 & if x_{k + 1} \notin O and b_{k} = 0 \end{matrix}$ where $o$ is a positive obstacle collision penalty, which should be taken large so that obstacle avoidance is given priority over minimizing time to transmit. Define also the long-term value function: $V^{π} (x_{0}) = \sum_{k = 0}^{\infty} ρ (x_{k}, u_{k}, x_{k + 1})$ where $x_{k + 1} = f (x_{k}, u_{k}, R (p_{k}))$ and $u_{k} = π (x_{k})$ obeys the state

Empirical study in the transmission problem

Consider a simulated robot with motion dynamics (1) given by the nonlinear, unicycle-like updates: $p_{k + 1, 1} = p_{k, 1} + T_{s} u_{k, 1} cos (u_{k, 2})$ $p_{k + 1, 2} = p_{k, 2} + T_{s} u_{k, 1} sin (u_{k, 2})$ i.e., the first input is the velocity and the second the heading of the robot. A set of discretized actions is taken that consists of moving at velocity $u_{1} = 1$ m/s along one of the headings $u_{2} = 0, π ∕ 4, \dots, 7 π ∕ 4$ rad; together with a $0$ -velocity action. The sampling period is $T_{s} = 4$ s. Note that since these dynamics are first-order, there is no extra

Solution for the navigation and transmission problem

As for PT, the paper first provides a model-based procedure, in Section 5.1. This procedure is not a contribution of this paper, but is adapted from Lohéac et al. (2019). Then, Section 5.2 gives the learning procedure for unknown SNR functions, which is a novel contribution; see again Table 1.

Empirical study in the navigation and transmission problem

In PN, only one antenna is allowed. It is placed at coordinates 100,30, with the same shape as in the PT experiments above, and with $R_{0} = 0.753$ . The same robot motion dynamics, position domain, and discretized actions are used as before. However, here the rate function is natively random: in all experiments, $z$ in (6) is random with a Rice distribution, per (5). To account for this randomness, the results always report the mean number of steps to reach the goal, together with the 95% confidence

Real-life illustration of the transmission problem

An illustration of the transmission problem, PT, is provided for a real quadcopter drone in an indoor environment. Specifically, a Parrot AR.Drone (Bristeau, Callou, Vissière, Petit, et al., 2011), version 2 will be used, along with a 4-camera OptiTrack Flex 13 motion capture system (Furtado, Liu, Lai, Lacheray, & Desouza-Coelho, 2019). The high-level motion dynamics (1) used in the learning algorithm have a simple-integrator form (without any extra signal $y$ ): $p_{k + 1} = p_{k} + T_{s} u_{k}$ with a sampling period

Conclusions

Two learning-based algorithms were proposed for a mobile robot to transmit data over a wireless network with an unknown rate map: one when the trajectory is free, in which case rectangular obstacles can be handled; and another when the robot must end up at a goal position. Extensive simulations showed that these algorithms achieve good performance, in some cases very close to model-based solutions that require to know the rate function. An illustration with a real UAV was given.

A relatively

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work was supported by a grant of the Romanian Ministry of Research and Innovation, CNCS - UEFISCDI, project number PN-III-P1-1.1-TE-2016-0670, within PNCDI III, and by the Post-Doctoral Programme “Entrepreneurial competences and excellence research in doctoral and postdoctoral programs - ANTREDOC”, project co-funded from the European Social Fund , contract no. 56437/24.07.2019.

References (36)

BristeauP.-J. et al.
The navigation and control technology inside the AR.Drone micro UAV
BuşoniuL. et al.
Approximate dynamic programming with a fuzzy parameterization
Automatica
(2010)
RookerM.N. et al.
Multi-robot exploration under the constraints of wireless networking
Control Engineering Practice
(2007)
WuD. et al.
Active learning for regression using greedy sampling
Information Sciences
(2019)
BertsekasD.P.
Dynamic programming and optimal control, Vol. 2
(2012)
Buşoniu, L., Varma, V. S., Morărescu, I.-C., & Lasaulce, S. (2019). Learning-based control for a communicating mobile...
Chatzipanagiotis, N., Liu, Y., Petropulu, A., & Zavlanos, M. M. (2012). Controlling groups of mobile beamformers. In...
D’ErricoJ.
Bound constrained optimization using fminsearch
(2012)
FinkJ. et al.
Online methods for radio signal mapping with mobile robots
FinkJ. et al.
Robust control for mobility and wireless communication in cyber–physical systems with application to robot teams
Proceedings of the IEEE
(2012)

FinkJ. et al.

Robust control of mobility and communications in autonomous robot teams

IEEE Access

(2013)

FurtadoJ.S. et al.

Comparative analysis of optitrack motion capture systems

GangulaR. et al.

Trajectory optimization for mobile access point

GhaffarkhahA. et al.

Communication-aware motion planning in mobile networks

IEEE Transactions on Automatic Control

(2011)

KrauseA. et al.

Near-optimal sensor placements in Gaussian processes: Theory, efficient algorithms and empirical studies

Journal of Machine Learning Research (JMLR)

(2008)

Licea, D. B., Varma, V. S., Lasaulce, S., Daafouz, J., & Ghogho, M. (2016). Trajectory planning for energy-efficient...

LiceaD.B. et al.

Robust trajectory planning for robotic communications under fading channels

LinL.-J.

Self-improving reactive agents based on reinforcement learning, planning and teaching

Machine Learning

(1992)

Cited by (9)

Time optimal control for a mobile robot with a communication objective
2022, Mathematics and Computers in Simulation
Citation Excerpt :
In this paper, various algorithms designed to navigate a robot to complete a certain communication task and reach a desired goal were experimentally (and numerically) studied. Our result on the simple integrator was one of the algorithms applied in [8] and was observed to perform well even in experiments. The rest of the paper is organized as follows.
The paper proposes control design strategies that minimize the time required by a mobile robot to accomplish a certain task (reach a target) while transmitting/receiving a message. The message delivery is done over a wireless network, and we account for path-loss while disregarding any shadowing phenomena, i.e., the transmission rate depends only on the distance to the wireless antenna. First, using the Pontryagin maximum principle we design a minimal-time control for the simplified robot dynamics described by a single integrator. Next, we show how we can use these theoretical results to efficiently control more complicated non-holonomic dynamics. Numerical simulations illustrate the effectiveness of the theoretical results.
Online learning control for path-aware global optimization with nonlinear mobile robots
2022, Control Engineering Practice
Citation Excerpt :
This refinement is weighted by the value of the bound and of the function itself at the current point, to focus the algorithm around the optima. To solve the control problem of maximizing the cumulative rewards (weighted predicted refinements) along the trajectory, the method uses an online version of value iteration (Bertsekas, 2019) combined with interpolation to handle the general robot dynamics (Buşoniu et al., 2010), and using a small number of updates to prevent extrapolating too much from the available information (Busoniu et al., 2020). The overall algorithm obtained is called path-aware optimistic optimization, OOPA.
Consider a robot with nonlinear dynamics that must quickly find a global optimum of an objective function defined over its operating area, e.g., a chemical concentration, physical measurement, quantity of material etc. The function is initially unknown and must be learned online from samples acquired in a single trajectory. Applying classical optimization methods in this scenario would be highly suboptimal, since they would place the next sample arbitrarily far, without taking into account robot motion constraints, and would not revise the path based on new information accumulated along it. To address these limitations, a novel algorithm called Path-Aware Optimistic Optimization (OOPA) is proposed. The decision of which robot action to apply is formulated as an optimal control problem in which the rewards are refinements of the upper bound on the objective, weighted by bound and objective values to focus the search around optima. OOPA is evaluated in extensive simulations where it is compared to path-unaware optimization baselines, and in a real experiment in which a ROBOTIS TurtleBot3 successfully searches for the lowest grayscale location on a 2D surface.
A simple path-aware optimization method for mobile robots∗
2022, IFAC-PapersOnLine
We present an approach for a mobile robot to seek the global maximum of an initially unknown function defined over its operating space. The method exploits a Lipschitz assumption to defne an upper bound on the function from previously seen samples, and optimistically moves towards the largest upper-bound point. This point is iteratively changed whenever new samples make it clear that it is suboptimal. In simulations, the method finds the global maxima with much less computation than an existing, much more involved technique, while keeping performance acceptable. Real-robot experiments confirm the effectiveness of the approach.
Geometric stochastic filter with guaranteed performance for autonomous navigation based on IMU and feature sensor fusion
2021, Control Engineering Practice
This paper concerns the estimation problem of attitude, position, and linear velocity of a rigid-body autonomously navigating with six degrees of freedom (6 DoF). The navigation dynamics are highly nonlinear and are modeled on the matrix Lie group of the extended Special Euclidean Group ${SE}_{2} (3)$ . A computationally cheap geometric nonlinear stochastic navigation filter is proposed on ${SE}_{2} (3)$ with guaranteed transient and steady-state performance. The proposed filter operates based on a fusion of sensor measurements collected by a low-cost inertial measurement unit (IMU) and features (obtained by a vision unit). The closed loop error signals are guaranteed to be almost semi-globally uniformly ultimately bounded in the mean square from almost any initial condition. The equivalent quaternion representation is included in the Appendix. The filter is proposed in continuous form, and its discrete form is tested on a real-world dataset of measurements collected by a quadrotor navigating in three dimensional (3D) space.
A path planning method for mobile robots incorporating artificial potential field method and ant colony algorithm
2023, Proceedings of SPIE - The International Society for Optical Engineering
An Optimal Control Approach to Particle Filtering on Lie Groups
2023, IEEE Control Systems Letters

View all citing articles on Scopus

View full text