loading page

Offline Reinforcement Learning without Regularization and Pessimism
  • +2
  • Longyang Huang,
  • Botao Dong,
  • Ning Pang,
  • Ruonan Liu,
  • Weidong Zhang
Longyang Huang
Department of Automation, Shanghai Jiao Tong University

Corresponding Author:[email protected]

Author Profile
Botao Dong
Department of Automation, Shanghai Jiao Tong University
Ning Pang
Department of Automation, Shanghai Jiao Tong University
Ruonan Liu
Department of Automation, Shanghai Jiao Tong University
Weidong Zhang
school of Information and Communication Engineering, Hainan University, Department of Automation, Shanghai Jiao Tong University

Abstract

Offline reinforcement learning (RL) learns policies for solving sequential decision problems directly from offline datasets. Most existing works focus on countering out-of-distribution (OOD) behavior to improve decision-making. This work investigates the fundamental Bellman inconsistency problem that leads to suboptimal decision-making, which is an essential factor causing suboptimality in offline settings. An offline RL algorithm without regularization and pessimism (RFORL) is proposed. The RFORL algorithm constrains uncertainties constructed by the inconsistency among Bellman estimations under an ensemble of learned Qfunctions. Compared to existing offline RL methods, the proposed RFORL achieves exponential convergence and allows learning the best policy from the given offline dataset, without adopting regularization techniques or pessimism. We empirically demonstrate that the Bellman inconsistency constraint mechanism is effective in improving policy learning. Consequently, RFORL outperforms existing offline RL approaches on most tasks of the D4RL benchmark.
30 May 2024Submitted to TechRxiv
07 Jun 2024Published in TechRxiv