Learning control for transmission and navigation with a mobile robot under unknown communication rates
Introduction
This paper considers problems in which a mobile robot must transmit a data file (or equivalently, empty a data buffer) over a wireless network, with transmission rates that depend on the robot position and may be affected by noise. The objective is to move in such a way that the buffer is emptied in minimum time. Two versions will be considered: one in which there is no desired end position for the robot, which is called the transmission problem and abbreviated as PT; and a second version in which the trajectory must end at a given goal position, called navigation-and-transmission problem (PN). Such problems appear e.g. when a UAV autonomously collects survey data (video, photographic, etc.) which it must then deliver over an ad-hoc network, possibly while navigating to the next mission waypoint.
The key challenge is that the rate function is usually unknown to the robot, e.g. because depends on unknown propagation environment effects like path loss, shadowing, and fast fading. For both PT and PN, algorithms are proposed that (i) learn approximations of the rate function from values sampled so far along the trajectory and (ii) at each step, apply optimal control with the current approximation to choose robot actions. In PT, (i) is done with supervised, local linear regression (Moore & Atkeson, 1993) and (ii) with a local version of dynamic programming (Bertsekas, 2012), for arbitrarily-shaped rate functions that are assumed deterministic for the design. The PT method includes a simple obstacle avoidance procedure. In PN, component (i) uses active learning (Settles, 2009) and (ii) is a time-optimal control design from Lohéac, S. Varma, and Morarescu (2019). In the latter case, the rate function is learned via the signal-to-noise ratio, which is affected by random fluctuations and is taken to have a radially-symmetric expression with unknown parameters.
A common idea in both the PT and PN approaches is to focus the learning method on the unknown element of the problem: the rate function, while exploiting the known motion dynamics of the robot in order to achieve the required fast, single-trajectory learning. Both approaches are validated in extensive simulations for a robot with nonlinear, unicycle-like motion dynamics and where the rate functions are given by single or multiple antennas with a path-loss form. Moreover, a real indoor experiment is provided that illustrates PT with a single router and a Parrot AR.Drone 2.
The problem of optimizing sensing locations to reduce the uncertainty of an unknown map is well-known in machine learning, see e.g. Krause, Singh, and Guestrin (2008) for some fundamental results. This problem is often solved dynamically, to obtain so-called informative path planning, see e.g. Meera, Popovic, Millane, and Siegwart (2019) and Viseras et al. (2016); Popovic et al. (2018) provide a good overview of this field. Closer to the present work, Fink and Kumar (2010) and Penumarthi et al. (2017) learn a radio map with multiple mobile robots. Compared to these works, the methods used in this paper to learn the map are much simpler — basic regression methods, whereas Gaussian processes (Rasmussen & Williams, 2006) are often used in the references above. However, the novelty here is that the robot has control and communication objectives, whereas in most of the works above the objective of the robot is just to learn the map (keeping in mind that learning must typically be done from a small number of samples, which is related but not identical to the time-optimality objectives in the present paper).
On the other hand, a related thread of work in engineering, which does consider joint control and communication objectives, is motion planning under connectivity, communication rate, or quality-of-service constraints (Chatzipanagiotis et al., 2012, Fink et al., 2013, Ghaffarkhah and Mostofi, 2011, Rooker and Birk, 2007). Closer to the present work, when the model of the wireless communication rate is known, some recent works (Gangula et al., 2017, Licea et al., 2016, Lohéac et al., 2019, Ooi and Schindelhauer, 2009) have optimized the trajectory of the robot while ensuring that a buffer is transmitted along the way. The key novelty here with respect to these works is that the rate function is not known in advance by the robot. Instead, it must be learned from samples observed while the robot travels in the environment. In particular, the approach to solve PN incorporates the known-rate method of Lohéac et al. (2019) by applying it iteratively, at each step, for the current learned estimate of the rate function.
Some recent papers (Fink et al., 2012, Licea et al., 2017, Rizzo et al., 2019, Rizzo et al., 2013) explore control of robots with rate functions that are uncertain but still have a known model. For instance, Rizzo et al., 2019, Rizzo et al., 2013 carefully develop rate models for tunnels, and robots then adapt to the parameters of these models. In Yan and Mostofi (2013), the rate function is initially unknown, but the trajectory of the robot is fixed and only the velocity is optimized.
Compared to the preliminary conference paper (Buşoniu, Varma, Morărescu, & Lasaulce, 2019), fully novel contributions here are solving PN with unknown rates and the real-system results. Buşoniu et al. (2019) focused only on a simpler version of PT, with first-order robot dynamics, synthetic rate functions, and without obstacles. For the PT version studied here, the robot dynamics are extended to be of arbitrary order, the method is illustrated with realistic rate functions, and – importantly – basic obstacle avoidance functionality is introduced.
Next, Section 2 gives the problem definition, following which the paper treats separately PT (Section 3 for the algorithm and Section 4 for the results) and PN (Sections 5 Solution for the navigation and transmission problem, 6 Empirical study in the navigation and transmission problem, respectively). The real-system illustration is provided in Section 7, and Section 8 concludes the paper.
Section snippets
Problem definition
Consider a mobile robot with position , additional motion-related states , , and inputs , . The extra states may contain e.g. velocities, headings, or other variables needed to model the robot’s motion. Dimension may be zero, in which case the only state signals are the positions and the robot has first-order dynamics. In this case, variable can be omitted from the formalism below (and, by convention, is a singleton). A discrete-time setting is considered
Solution for the transmission problem
First, PT is reformulated as a deterministic optimal control problem. Define the union of all obstacles as , and the following stage reward function: where is a positive obstacle collision penalty, which should be taken large so that obstacle avoidance is given priority over minimizing time to transmit. Define also the long-term value function: where and obeys the state
Empirical study in the transmission problem
Consider a simulated robot with motion dynamics (1) given by the nonlinear, unicycle-like updates: i.e., the first input is the velocity and the second the heading of the robot. A set of discretized actions is taken that consists of moving at velocity m/s along one of the headings rad; together with a -velocity action. The sampling period is s. Note that since these dynamics are first-order, there is no extra
Solution for the navigation and transmission problem
As for PT, the paper first provides a model-based procedure, in Section 5.1. This procedure is not a contribution of this paper, but is adapted from Lohéac et al. (2019). Then, Section 5.2 gives the learning procedure for unknown SNR functions, which is a novel contribution; see again Table 1.
Empirical study in the navigation and transmission problem
In PN, only one antenna is allowed. It is placed at coordinates 100,30, with the same shape as in the PT experiments above, and with . The same robot motion dynamics, position domain, and discretized actions are used as before. However, here the rate function is natively random: in all experiments, in (6) is random with a Rice distribution, per (5). To account for this randomness, the results always report the mean number of steps to reach the goal, together with the 95% confidence
Real-life illustration of the transmission problem
An illustration of the transmission problem, PT, is provided for a real quadcopter drone in an indoor environment. Specifically, a Parrot AR.Drone (Bristeau, Callou, Vissière, Petit, et al., 2011), version 2 will be used, along with a 4-camera OptiTrack Flex 13 motion capture system (Furtado, Liu, Lai, Lacheray, & Desouza-Coelho, 2019). The high-level motion dynamics (1) used in the learning algorithm have a simple-integrator form (without any extra signal ): with a sampling period
Conclusions
Two learning-based algorithms were proposed for a mobile robot to transmit data over a wireless network with an unknown rate map: one when the trajectory is free, in which case rectangular obstacles can be handled; and another when the robot must end up at a goal position. Extensive simulations showed that these algorithms achieve good performance, in some cases very close to model-based solutions that require to know the rate function. An illustration with a real UAV was given.
A relatively
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgments
This work was supported by a grant of the Romanian Ministry of Research and Innovation, CNCS - UEFISCDI, project number PN-III-P1-1.1-TE-2016-0670, within PNCDI III, and by the Post-Doctoral Programme “Entrepreneurial competences and excellence research in doctoral and postdoctoral programs - ANTREDOC”, project co-funded from the European Social Fund , contract no. 56437/24.07.2019.
References (36)
- et al.
The navigation and control technology inside the AR.Drone micro UAV
- et al.
Approximate dynamic programming with a fuzzy parameterization
Automatica
(2010) - et al.
Multi-robot exploration under the constraints of wireless networking
Control Engineering Practice
(2007) - et al.
Active learning for regression using greedy sampling
Information Sciences
(2019) Dynamic programming and optimal control, Vol. 2
(2012)- Buşoniu, L., Varma, V. S., Morărescu, I.-C., & Lasaulce, S. (2019). Learning-based control for a communicating mobile...
- Chatzipanagiotis, N., Liu, Y., Petropulu, A., & Zavlanos, M. M. (2012). Controlling groups of mobile beamformers. In...
Bound constrained optimization using fminsearch
(2012)- et al.
Online methods for radio signal mapping with mobile robots
- et al.
Robust control for mobility and wireless communication in cyber–physical systems with application to robot teams
Proceedings of the IEEE
(2012)
Robust control of mobility and communications in autonomous robot teams
IEEE Access
Comparative analysis of optitrack motion capture systems
Trajectory optimization for mobile access point
Communication-aware motion planning in mobile networks
IEEE Transactions on Automatic Control
Near-optimal sensor placements in Gaussian processes: Theory, efficient algorithms and empirical studies
Journal of Machine Learning Research (JMLR)
Robust trajectory planning for robotic communications under fading channels
Self-improving reactive agents based on reinforcement learning, planning and teaching
Machine Learning
Cited by (9)
Time optimal control for a mobile robot with a communication objective
2022, Mathematics and Computers in SimulationCitation Excerpt :In this paper, various algorithms designed to navigate a robot to complete a certain communication task and reach a desired goal were experimentally (and numerically) studied. Our result on the simple integrator was one of the algorithms applied in [8] and was observed to perform well even in experiments. The rest of the paper is organized as follows.
Online learning control for path-aware global optimization with nonlinear mobile robots
2022, Control Engineering PracticeCitation Excerpt :This refinement is weighted by the value of the bound and of the function itself at the current point, to focus the algorithm around the optima. To solve the control problem of maximizing the cumulative rewards (weighted predicted refinements) along the trajectory, the method uses an online version of value iteration (Bertsekas, 2019) combined with interpolation to handle the general robot dynamics (Buşoniu et al., 2010), and using a small number of updates to prevent extrapolating too much from the available information (Busoniu et al., 2020). The overall algorithm obtained is called path-aware optimistic optimization, OOPA.
A simple path-aware optimization method for mobile robots∗
2022, IFAC-PapersOnLineGeometric stochastic filter with guaranteed performance for autonomous navigation based on IMU and feature sensor fusion
2021, Control Engineering PracticeA path planning method for mobile robots incorporating artificial potential field method and ant colony algorithm
2023, Proceedings of SPIE - The International Society for Optical EngineeringAn Optimal Control Approach to Particle Filtering on Lie Groups
2023, IEEE Control Systems Letters