TechRxiv
Safe_RL_3rdrev_20230612_arxiv.pdf (6.1 MB)

Model-Based Safe Reinforcement Learning with Time-Varying State and Control Constraints: An Application to Intelligent Vehicles

Download (6.1 MB)
preprint
posted on 2023-08-15, 14:27 authored by Xinglong ZhangXinglong Zhang, Yaoqian Peng, Biao Luo, Wei Pan, Xin Xu, Haibin Xie

In recent years, safe reinforcement learning (RL) with the actor-critic structure has gained significant interest for continuous control tasks. However, achieving near-optimal control policies with safety and convergence guarantees remains challenging. Moreover, few works have focused on designing RL algorithms that handle time-varying safety constraints. This paper proposes a safe RL algorithm for optimal control of nonlinear systems with time-varying state and control constraints. The algorithm's novelty lies in two key aspects. Firstly, the approach introduces a unique barrier force-based control policy structure to ensure control safety during learning. Secondly, a multi-step policy evaluation mechanism is employed, enabling the prediction of policy safety risks under time-varying constraints and guiding safe updates. Theoretical results on learning convergence, stability, and robustness are proven. The proposed algorithm outperforms several state-of-the-art RL algorithms in the simulated Safety Gym environment. It is also applied to the real-world problem of integrated path following and collision avoidance for two intelligent vehicles – a differential-drive vehicle and an Ackermann-drive one. The experimental results demonstrate the impressive  sim-to-real transfer capability of our approach, while showcasing satisfactory online control performance.

History

Email Address of Submitting Author

zhangxinglong18@nudt.edu.cn

ORCID of Submitting Author

https://orcid.org/ 0000-0002-0587-2487

Submitting Author's Institution

National University of Defense Technology

Submitting Author's Country

  • China