Elsevier

Automatica

Volume 129, July 2021, 109658
Automatica

Brief paper
Distributed inverse optimal control

https://doi.org/10.1016/j.automatica.2021.109658Get rights and content

Abstract

This paper develops a distributed approach for inverse optimal control (IOC) in multi-agent systems. Here each agent can only communicate with certain nearby neighbors and only accesses segments of system’s trajectory, which is not sufficient for the agent to solve the IOC problem alone. By introducing the concept of the data effectiveness and bridging the connection between each segment and its contribution to solving IOC, we formulate the IOC problem as a problem of achieving least-square solutions via a distributed algorithm. Simulations are provided to validate the proposed distributed IOC approach.

Introduction

Inverse optimal control (IOC) seeks to learn an underlying objective function of an optimal control system from its optimal trajectories (Ng, Russell, et al., 2000). Its applications include imitation learning (Finn, Levine, & Abbeel, 2016), where robots learn objective functions from expert demonstrations, autonomous driving, (Kuderer, Gulati, & Burgard, 2015), where human driving styles are transferred to vehicle controllers, and human–robot collaboration (Mainprice, Hayne, & Berenson, 2016), where the human control objective is learned for coordination.

In the literature of IOC (Abbeel and Ng, 2004, Englert et al., 2017, Jin, Wang et al., 2020, Keshavarz et al., Molloy et al., 2018, Puydupin-Jamin et al., 2012, Ratliff et al., 2006, Ziebart et al., 2008), an unknown objective function is typically parameterized as a weighted sum of selected features (or basis functions). Then the problem of solving IOC is reduced to estimate weights for selected features. One direction for solving IOC is by a double-layer scheme with weights estimate updated in the outer layer and the corresponding optimal control in the inner layer. Methods developed under this idea include feature matching (Abbeel & Ng, 2004), where the weights are updated towards reducing the difference of feature values between demonstrations and the predicted trajectory; maximum margin (Ratliff et al., 2006), where the weights are solved by maximizing the margin of the objective values between demonstrations and the predicted trajectory; maximum entropy (Ziebart et al., 2008), where the weights are updated to maximize the entropy of the trajectory probability distribution while matching the feature values of demonstrations; and minimizing prediction loss (Jin, Wang et al., 2020), where the weights are solved by minimizing the distance between the predicted trajectory and observed one. Recent research for IOC has been made by minimizing the violation of optimality conditions such as Karush–Kuhn–Tucker (KKT) conditions and Pontryagin’s maximum principle by demonstrations, which has been shown to be more computationally efficient (Englert et al., 2017, Keshavarz et al., Molloy et al., 2018, Puydupin-Jamin et al., 2012).

With significant progress in IOC as stated above, existing techniques are mostly designed in a centralized way. This limits their applications to the practical situation when complete observed trajectories exceed the memory/computation capability of any central processor. A natural idea to address this is by employing a multi-agent system for IOC, in which each agent only observes trajectory segments. By a trajectory segment, we mean a segment of trajectory within a certain interval of the overall time horizon, which can also permit a single point of the trajectory. This challenging situation when segment other than complete trajectory is available may also arise frequently from data missing, limited sensor capability or occlusion. The authors of Bogert, Lin, Doshi, and Kulic (2016) have initiated some efforts to address such challenging situation by modeling missing data using hidden variables when the missing portion is small (Bogert & Doshi, 2017). Very recently, the authors of Jin et al., 2019, Jin et al., 2021 have introduced the concept of a recovery matrix for solving IOC when only trajectory segments are available, which are required to be consecutive and satisfy a matrix rank condition.

Recognition of this has motivated us to develop a distributed IOC method for multi-agent systems, in which each agent can only communicate with its nearby neighbors. We further suppose each agent can only access several trajectory segments, which may not suffice to infer the objective function by itself. Our contribution includes two aspects: first, we propose a way to evaluate whether a trajectory segment can contribute to IOC, and if so, such segment will impose a linear constraint to the unknown objective function weights; we also establish IOC identifiability from trajectory segments; and second, we develop a distributed algorithm to enable all agents to collaboratively solve the weights exponentially fast by communicating with their neighbors. This is perhaps the first distributed algorithm for solving IOC.

Notation: The column operator col{x1,,xk} stacks its arguments into a column. xk1:k2 means a stack of vectors x indexed from k1 to k2 (k1k2), i.e., xk1:k2=col{xk1,xk2}. A (bold-type) denotes a block matrix. Given a vector function f(x) and a value x, fx denotes the Jacobian matrix with respect to x evaluated at x. The zero matrix/vector is 0, and the identity matrix is I, both with appropriate dimensions. 1N is the all-one vector of N dimension. kerA is the kernel of matrix A.

Section snippets

Problem statement

Consider an optimal control system with dynamics xt+1=f(xt,ut+1),x0Rn,where f:Rn×RmRn is differentiable; xtRn is the system state; utRm is the input; and t=0,1, is the time step. A trajectory of states-inputs of the optimal control system over time horizon T is defined by ξ={ξt,t=1,2,,T} with ξt={xt,ut},which results from minimizing an unknown objective function parameterized as a weighted sum of features: J(x1:T,u1:T)=t=1Tωϕ(xt,ut).Here, ϕ=[ϕ1,ϕ2,,ϕr]:Rn×RmRr is a given feature

IOC identifiability from trajectory segments

In this section we will introduce an index to evaluate whether or not a trajectory segment contributes to solving IOC, and then we establish the IOC identifiability from trajectory segments.

A distributed algorithm for IOC

We now consider a graph G of N agents and each agent i has only access to a set of trajectory segments Si as in (4). By Theorem 1, one has R(Si)ω=0,i=1,2,,N.Specially, R(Si)=0 if there is no IOC-effective trajectory segment in Si. Thus, each agent i with trajectory segments Si only knows R(Si). Then all we need is to develop distributed algorithms in Mou, Liu, and Morse (2015) and Zhou, Wang, Mou, and Anderson (2020) for N agents to cooperatively solve the group of linear equations in (27).

Simulations

We evaluate the proposed method on a simulated two-link robot arm moving vertically. The dynamics of the arm is M(θ)θ̈+C(θ,θ̇)θ̇+g(θ)=τ, where θ=[θ1,θ2] and τ=[τ1,τ2] are the joint angles and input torques (see Spong and Vidyasagar (2008, p. 209), all parameters are set as units). Define the system state variable x=[θ1,θ2,θ1̇,θ2̇] and control input u=[τ1,τ2], and discretize the dynamics by Euler method with a time interval 0.05s. The arm is controlled to minimize (3) here with ϕ=[τ12+τ22,(θ1

Conclusions

This paper has developed a distributed algorithm to solve inverse optimal control in multi-agent systems, in which each agent only gets access to trajectory segments. Although each agent may not be able to solve the IOC alone, each agent could coordinate with its nearby neighbors and cooperatively solve the IOC problem exponentially fast. Our future research includes applications of the proposed distributed inverse optimal control into robot’s learning from human’s sparse demonstrations (Jin,

Wanxin Jin is currently a fourth-year Ph.D. student at the School of Aeronautics and Astronautics, Purdue University. Prior to Purdue, he worked as a research assistant at the Technical University of Munich, Germany, from 2016 to 2017. Wanxin Jin received his B.S. degree and M.Sc. in Control Science and Engineering from Harbin Institute of Technology, China, in 2014 and 2016, respectively. His research interest spans control theory, machine learning, and optimization with applications to

References (27)

  • MolloyTimothy L. et al.

    Finite-horizon inverse optimal control for discrete-time nonlinear systems

    Automatica

    (2018)
  • Abbeel, Pieter, & Ng, Andrew Y. (2004). Apprenticeship learning via inverse reinforcement learning. In International...
  • BertsekasDimitri P.

    Nonlinear programming

    Journal of the Operational Research Society

    (1997)
  • Bogert, Kenneth, & Doshi, Prashant (2017). Scaling expectation-maximization for inverse reinforcement learning to...
  • Bogert, Kenneth, Lin, Jonathan Feng-Shun, Doshi, Prashant, & Kulic, Dana (2016). Expectation-maximization for inverse...
  • EnglertPeter et al.

    Inverse KKT: Learning cost functions of manipulation tasks from demonstrations

    International Journal of Robotics Research

    (2017)
  • Finn, Chelsea, Levine, Sergey, & Abbeel, Pieter (2016). Guided cost learning: Deep inverse optimal control via policy...
  • JinWanxin et al.

    Inverse optimal control for multiphase cost functions

    IEEE Transactions on Robotics

    (2019)
  • JinWanxin et al.

    Inverse optimal control with incomplete observations

    International Journal of Robotics Research

    (2021)
  • JinWanxin et al.

    Learning from sparse demonstrations

    (2020)
  • JinWanxin et al.

    Learning from incremental directional corrections

    (2020)
  • JinWanxin et al.

    Pontryagin differentiable programming: An end-to-end learning and control framework

    Advances in Neural Information Processing Systems (NeurIPS)

    (2020)
  • Keshavarz, Arezou, Wang, Yang, & Boyd, Stephen (2011). Imputing a convex objective function. In IEEE international...
  • Cited by (16)

    View all citing articles on Scopus

    Wanxin Jin is currently a fourth-year Ph.D. student at the School of Aeronautics and Astronautics, Purdue University. Prior to Purdue, he worked as a research assistant at the Technical University of Munich, Germany, from 2016 to 2017. Wanxin Jin received his B.S. degree and M.Sc. in Control Science and Engineering from Harbin Institute of Technology, China, in 2014 and 2016, respectively. His research interest spans control theory, machine learning, and optimization with applications to robotics and human–robot autonomy.

    Shaoshuai Mou is an Assistant Professor in the School of Aeronautics and Astronautics at Purdue University, where he directs the Autonomous and Intelligent Multi-agent Systems (AIMS) Lab and also co-direct Purdue’s new Center for Innovation in Control, Optimization and Networks (ICON). Before joining Purdue, he received a Ph.D. in Electrical Engineering at Yale University in 2014 and worked as a postdoc researcher at MIT till 2015. His research interests include multi-agent autonomy and learning, distributed algorithms for control and optimization, human–machine teaming, resilience & cybersecurity, and also experimental research involving autonomous air and ground vehicles.

    The research is supported by a funding from Northrop Grumman Mission Systems’ University Research Program on Research in Applications for Learning Machines Consortium (REALM), USA . The material in this paper was not presented at any conference. This paper was recommended for publication in revised form by Associate Editor Dimos V. Dimarogonas under the direction of Editor Christos G. Cassandras.

    View full text