TechRxiv
Efficient_Distributional_Reinforcement_Learning with_Kullback-Leibler_Divergence Regularization_preprint_TechRxiv.pdf (8.51 MB)
Download file

Efficient Distributional Reinforcement Learning with Kullback-Leibler Divergence Regularization

Download (8.51 MB)
preprint
posted on 2022-05-02, 20:38 authored by Renxing Li, Zhiwei Shang, Chunhua Zheng, Huiyun Li, Qing Liang, Yunduan CuiYunduan Cui
In this article, we address the issues of stability and data-efficiency in reinforcement learning (RL). A novel RL approach, Kullback–Leibler divergence-regularized distributional RL (KLC51) is proposed to integrate the advantages of both stability in the distributional RL and data-efficiency in the Kullback-Leibler (KL) divergence-regularized RL in one framework. KLC51 derived the Bellman equation and the TD errors regularized by KL divergence in a distributional perspective and explored the approximated strategies of properly mapping the corresponding Boltzmann softmax term into distributions. Evaluated by several benchmark tasks with different complexity, the proposed method clearly illustrates the positive effect of the KL divergence regularization to the distributional RL including exclusive exploration behaviors and smooth value function update, and successfully demonstrates its significant superiority in both learning stability and data-efficiency compared with the related baseline approaches.

History

Email Address of Submitting Author

cuiyunduan@gmail.com

ORCID of Submitting Author

0000-0001-5539-4260

Submitting Author's Institution

Shenzhen Institutes of Advanced Technology Chinese Academy of Sciences

Submitting Author's Country

  • China

Usage metrics

    Exports