Learning control for transmission and navigation with a mobile robot under unknown communication rates

https://doi.org/10.1016/j.conengprac.2020.104460Get rights and content

Abstract

In tasks such as surveying or monitoring remote regions, an autonomous robot must move while transmitting data over a wireless network with unknown, position-dependent transmission rates. For such a robot, this paper considers the problem of transmitting a data buffer in minimum time, while possibly also navigating towards a goal position. Two approaches are proposed, each consisting of a machine-learning component that estimates the rate function from samples; and of an optimal-control component that moves the robot given the current rate function estimate. Simple obstacle avoidance is performed for the case without a goal position. In extensive simulations, these methods achieve competitive performance compared to known-rate and unknown-rate baselines. A real indoor experiment is provided in which a Parrot AR.Drone 2 successfully learns to transmit the buffer.

Introduction

This paper considers problems in which a mobile robot must transmit a data file (or equivalently, empty a data buffer) over a wireless network, with transmission rates that depend on the robot position and may be affected by noise. The objective is to move in such a way that the buffer is emptied in minimum time. Two versions will be considered: one in which there is no desired end position for the robot, which is called the transmission problem and abbreviated as PT; and a second version in which the trajectory must end at a given goal position, called navigation-and-transmission problem (PN). Such problems appear e.g. when a UAV autonomously collects survey data (video, photographic, etc.) which it must then deliver over an ad-hoc network, possibly while navigating to the next mission waypoint.

The key challenge is that the rate function is usually unknown to the robot, e.g. because depends on unknown propagation environment effects like path loss, shadowing, and fast fading. For both PT and PN, algorithms are proposed that (i) learn approximations of the rate function from values sampled so far along the trajectory and (ii) at each step, apply optimal control with the current approximation to choose robot actions. In PT, (i) is done with supervised, local linear regression (Moore & Atkeson, 1993) and (ii) with a local version of dynamic programming (Bertsekas, 2012), for arbitrarily-shaped rate functions that are assumed deterministic for the design. The PT method includes a simple obstacle avoidance procedure. In PN, component (i) uses active learning (Settles, 2009) and (ii) is a time-optimal control design from Lohéac, S. Varma, and Morarescu (2019). In the latter case, the rate function is learned via the signal-to-noise ratio, which is affected by random fluctuations and is taken to have a radially-symmetric expression with unknown parameters.

A common idea in both the PT and PN approaches is to focus the learning method on the unknown element of the problem: the rate function, while exploiting the known motion dynamics of the robot in order to achieve the required fast, single-trajectory learning. Both approaches are validated in extensive simulations for a robot with nonlinear, unicycle-like motion dynamics and where the rate functions are given by single or multiple antennas with a path-loss form. Moreover, a real indoor experiment is provided that illustrates PT with a single router and a Parrot AR.Drone 2.

The problem of optimizing sensing locations to reduce the uncertainty of an unknown map is well-known in machine learning, see e.g. Krause, Singh, and Guestrin (2008) for some fundamental results. This problem is often solved dynamically, to obtain so-called informative path planning, see e.g. Meera, Popovic, Millane, and Siegwart (2019) and Viseras et al. (2016); Popovic et al. (2018) provide a good overview of this field. Closer to the present work, Fink and Kumar (2010) and Penumarthi et al. (2017) learn a radio map with multiple mobile robots. Compared to these works, the methods used in this paper to learn the map are much simpler — basic regression methods, whereas Gaussian processes (Rasmussen & Williams, 2006) are often used in the references above. However, the novelty here is that the robot has control and communication objectives, whereas in most of the works above the objective of the robot is just to learn the map (keeping in mind that learning must typically be done from a small number of samples, which is related but not identical to the time-optimality objectives in the present paper).

On the other hand, a related thread of work in engineering, which does consider joint control and communication objectives, is motion planning under connectivity, communication rate, or quality-of-service constraints (Chatzipanagiotis et al., 2012, Fink et al., 2013, Ghaffarkhah and Mostofi, 2011, Rooker and Birk, 2007). Closer to the present work, when the model of the wireless communication rate is known, some recent works (Gangula et al., 2017, Licea et al., 2016, Lohéac et al., 2019, Ooi and Schindelhauer, 2009) have optimized the trajectory of the robot while ensuring that a buffer is transmitted along the way. The key novelty here with respect to these works is that the rate function is not known in advance by the robot. Instead, it must be learned from samples observed while the robot travels in the environment. In particular, the approach to solve PN incorporates the known-rate method of Lohéac et al. (2019) by applying it iteratively, at each step, for the current learned estimate of the rate function.

Some recent papers (Fink et al., 2012, Licea et al., 2017, Rizzo et al., 2019, Rizzo et al., 2013) explore control of robots with rate functions that are uncertain but still have a known model. For instance, Rizzo et al., 2019, Rizzo et al., 2013 carefully develop rate models for tunnels, and robots then adapt to the parameters of these models. In Yan and Mostofi (2013), the rate function is initially unknown, but the trajectory of the robot is fixed and only the velocity is optimized.

Compared to the preliminary conference paper (Buşoniu, Varma, Morărescu, & Lasaulce, 2019), fully novel contributions here are solving PN with unknown rates and the real-system results. Buşoniu et al. (2019) focused only on a simpler version of PT, with first-order robot dynamics, synthetic rate functions, and without obstacles. For the PT version studied here, the robot dynamics are extended to be of arbitrary order, the method is illustrated with realistic rate functions, and – importantly – basic obstacle avoidance functionality is introduced.

Next, Section 2 gives the problem definition, following which the paper treats separately PT (Section 3 for the algorithm and Section 4 for the results) and PN (Sections 5 Solution for the navigation and transmission problem, 6 Empirical study in the navigation and transmission problem, respectively). The real-system illustration is provided in Section 7, and Section 8 concludes the paper.

Section snippets

Problem definition

Consider a mobile robot with position pPR2, additional motion-related states yYRny, ny0, and inputs uURnu, nu1. The extra states y may contain e.g. velocities, headings, or other variables needed to model the robot’s motion. Dimension ny may be zero, in which case the only state signals are the positions and the robot has first-order dynamics. In this case, variable y can be omitted from the formalism below (and, by convention, R0 is a singleton). A discrete-time setting is considered

Solution for the transmission problem

First, PT is reformulated as a deterministic optimal control problem. Define the union of all obstacles as O, and the following stage reward function: ρ(xk,uk,xk+1)=oif xk+1O1if xk+1O and bk>00if xk+1O and bk=0where o is a positive obstacle collision penalty, which should be taken large so that obstacle avoidance is given priority over minimizing time to transmit. Define also the long-term value function: Vπ(x0)=k=0ρ(xk,uk,xk+1)where xk+1=f(xk,uk,R(pk)) and uk=π(xk) obeys the state

Empirical study in the transmission problem

Consider a simulated robot with motion dynamics (1) given by the nonlinear, unicycle-like updates: pk+1,1=pk,1+Tsuk,1cos(uk,2)pk+1,2=pk,2+Tsuk,1sin(uk,2)i.e., the first input is the velocity and the second the heading of the robot. A set of discretized actions is taken that consists of moving at velocity u1=1  m/s along one of the headings u2=0,π4,,7π4  rad; together with a 0-velocity action. The sampling period is Ts=4  s. Note that since these dynamics are first-order, there is no extra

Solution for the navigation and transmission problem

As for PT, the paper first provides a model-based procedure, in Section 5.1. This procedure is not a contribution of this paper, but is adapted from Lohéac et al. (2019). Then, Section 5.2 gives the learning procedure for unknown SNR functions, which is a novel contribution; see again Table 1.

Empirical study in the navigation and transmission problem

In PN, only one antenna is allowed. It is placed at coordinates 100,30, with the same shape as in the PT experiments above, and with R0=0.753. The same robot motion dynamics, position domain, and discretized actions are used as before. However, here the rate function is natively random: in all experiments, z in (6) is random with a Rice distribution, per (5). To account for this randomness, the results always report the mean number of steps to reach the goal, together with the 95% confidence

Real-life illustration of the transmission problem

An illustration of the transmission problem, PT, is provided for a real quadcopter drone in an indoor environment. Specifically, a Parrot AR.Drone (Bristeau, Callou, Vissière, Petit, et al., 2011), version 2 will be used, along with a 4-camera OptiTrack Flex 13 motion capture system (Furtado, Liu, Lai, Lacheray, & Desouza-Coelho, 2019). The high-level motion dynamics (1) used in the learning algorithm have a simple-integrator form (without any extra signal y): pk+1=pk+Tsukwith a sampling period

Conclusions

Two learning-based algorithms were proposed for a mobile robot to transmit data over a wireless network with an unknown rate map: one when the trajectory is free, in which case rectangular obstacles can be handled; and another when the robot must end up at a goal position. Extensive simulations showed that these algorithms achieve good performance, in some cases very close to model-based solutions that require to know the rate function. An illustration with a real UAV was given.

A relatively

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work was supported by a grant of the Romanian Ministry of Research and Innovation, CNCS - UEFISCDI, project number PN-III-P1-1.1-TE-2016-0670, within PNCDI III, and by the Post-Doctoral Programme “Entrepreneurial competences and excellence research in doctoral and postdoctoral programs - ANTREDOC”, project co-funded from the European Social Fund , contract no. 56437/24.07.2019.

References (36)

  • FinkJ. et al.

    Robust control of mobility and communications in autonomous robot teams

    IEEE Access

    (2013)
  • FurtadoJ.S. et al.

    Comparative analysis of optitrack motion capture systems

  • GangulaR. et al.

    Trajectory optimization for mobile access point

  • GhaffarkhahA. et al.

    Communication-aware motion planning in mobile networks

    IEEE Transactions on Automatic Control

    (2011)
  • KrauseA. et al.

    Near-optimal sensor placements in Gaussian processes: Theory, efficient algorithms and empirical studies

    Journal of Machine Learning Research (JMLR)

    (2008)
  • Licea, D. B., Varma, V. S., Lasaulce, S., Daafouz, J., & Ghogho, M. (2016). Trajectory planning for energy-efficient...
  • LiceaD.B. et al.

    Robust trajectory planning for robotic communications under fading channels

  • LinL.-J.

    Self-improving reactive agents based on reinforcement learning, planning and teaching

    Machine Learning

    (1992)
  • Cited by (9)

    • Time optimal control for a mobile robot with a communication objective

      2022, Mathematics and Computers in Simulation
      Citation Excerpt :

      In this paper, various algorithms designed to navigate a robot to complete a certain communication task and reach a desired goal were experimentally (and numerically) studied. Our result on the simple integrator was one of the algorithms applied in [8] and was observed to perform well even in experiments. The rest of the paper is organized as follows.

    • Online learning control for path-aware global optimization with nonlinear mobile robots

      2022, Control Engineering Practice
      Citation Excerpt :

      This refinement is weighted by the value of the bound and of the function itself at the current point, to focus the algorithm around the optima. To solve the control problem of maximizing the cumulative rewards (weighted predicted refinements) along the trajectory, the method uses an online version of value iteration (Bertsekas, 2019) combined with interpolation to handle the general robot dynamics (Buşoniu et al., 2010), and using a small number of updates to prevent extrapolating too much from the available information (Busoniu et al., 2020). The overall algorithm obtained is called path-aware optimistic optimization, OOPA.

    • A path planning method for mobile robots incorporating artificial potential field method and ant colony algorithm

      2023, Proceedings of SPIE - The International Society for Optical Engineering
    View all citing articles on Scopus
    View full text