Chance-Constrained Trajectory Optimization for Safe Exploration and Learning of Nonlinear Systems,IEEE Robotics and Automation Letters

当前位置： X-MOL 学术 › IEEE Robot. Automation Lett. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Chance-Constrained Trajectory Optimization for Safe Exploration and Learning of Nonlinear Systems
IEEE Robotics and Automation Letters ( IF 4.6 ) Pub Date : 2021-04-01 , DOI: 10.1109/lra.2020.3044033
Yashwanth Kumar Nakka , Anqi Liu , Guanya Shi , Anima Anandkumar , Yisong Yue , Soon-Jo Chung

Learning-based control algorithms require data collection with abundant supervision for training. Safe exploration algorithms ensure the safety of this data collection process even when only partial knowledge is available. We present a new approach for optimal motion planning with safe exploration that integrates chance-constrained stochastic optimal control with dynamics learning and feedback control. We derive an iterative convex optimization algorithm that solves an \underline{Info}rmation-cost \underline{S}tochastic \underline{N}onlinear \underline{O}ptimal \underline{C}ontrol problem (Info-SNOC). The optimization objective encodes control cost for performance and exploration cost for learning, and the safety is incorporated as distributionally robust chance constraints. The dynamics are predicted from a robust regression model that is learned from data. The Info-SNOC algorithm is used to compute a sub-optimal pool of safe motion plans that aid in exploration for learning unknown residual dynamics under safety constraints. A stable feedback controller is used to execute the motion plan and collect data for model learning. We prove the safety of rollout from our exploration method and reduction in uncertainty over epochs, thereby guaranteeing the consistency of our learning method. We validate the effectiveness of Info-SNOC by designing and implementing a pool of safe trajectories for a planar robot. We demonstrate that our approach has higher success rate in ensuring safety when compared to a deterministic trajectory optimization approach.

中文翻译：

用于非线性系统安全探索和学习的机会约束轨迹优化

基于学习的控制算法需要在大量监督下收集数据以进行训练。即使只有部分知识可用，安全探索算法也能确保此数据收集过程的安全。我们提出了一种通过安全探索进行最优运动规划的新方法，该方法将机会约束随机最优控制与动力学学习和反馈控制相结合。我们推导出迭代凸优化算法，该算法解决 \underline{Info}rmation-cost \underline{S}tochastic \underline{N}onlinear \underline{O}ptimal \underline{C} 控制问题（Info-SNOC）。优化目标对性能的控制成本和学习的探索成本进行编码，并将安全性作为分布鲁棒的机会约束合并。动态是根据从数据中学习的稳健回归模型预测的。Info-SNOC 算法用于计算安全运动计划的次优池，这些计划有助于探索在安全约束下学习未知的残余动力学。稳定的反馈控制器用于执行运动计划并收集用于模型学习的数据。我们从我们的探索方法中证明了 rollout 的安全性，并减少了 epoch 的不确定性，从而保证了我们学习方法的一致性。我们通过为平面机器人设计和实施安全轨迹池来验证 Info-SNOC 的有效性。我们证明，与确定性轨迹优化方法相比，我们的方法在确保安全方面具有更高的成功率。Info-SNOC 算法用于计算安全运动计划的次优池，这些计划有助于探索在安全约束下学习未知的残余动力学。稳定的反馈控制器用于执行运动计划并收集用于模型学习的数据。我们从我们的探索方法中证明了 rollout 的安全性，并减少了 epoch 的不确定性，从而保证了我们学习方法的一致性。我们通过为平面机器人设计和实施安全轨迹池来验证 Info-SNOC 的有效性。我们证明，与确定性轨迹优化方法相比，我们的方法在确保安全方面具有更高的成功率。Info-SNOC 算法用于计算安全运动计划的次优池，这些计划有助于探索在安全约束下学习未知的残余动力学。稳定的反馈控制器用于执行运动计划并收集用于模型学习的数据。我们从我们的探索方法中证明了 rollout 的安全性，并减少了 epoch 的不确定性，从而保证了我们学习方法的一致性。我们通过为平面机器人设计和实施安全轨迹池来验证 Info-SNOC 的有效性。我们证明，与确定性轨迹优化方法相比，我们的方法在确保安全方面具有更高的成功率。稳定的反馈控制器用于执行运动计划并收集用于模型学习的数据。我们从我们的探索方法中证明了 rollout 的安全性，并减少了 epoch 的不确定性，从而保证了我们学习方法的一致性。我们通过为平面机器人设计和实施安全轨迹池来验证 Info-SNOC 的有效性。我们证明，与确定性轨迹优化方法相比，我们的方法在确保安全方面具有更高的成功率。稳定的反馈控制器用于执行运动计划并收集用于模型学习的数据。我们从我们的探索方法中证明了 rollout 的安全性，并减少了 epoch 的不确定性，从而保证了我们学习方法的一致性。我们通过为平面机器人设计和实施安全轨迹池来验证 Info-SNOC 的有效性。我们证明，与确定性轨迹优化方法相比，我们的方法在确保安全方面具有更高的成功率。

更新日期：2021-04-01

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文