loading page

FARANE-Q: Fast Parallel and Pipeline Q-Learning Accelerator for Configurable Reinforcement Learning SoC
  • +2
  • Nana Sutisna ,
  • Andi M. R. Ilmy ,
  • Infall Syafalni ,
  • Rahmat Mulyawan ,
  • Trio Adiono
Nana Sutisna
Author Profile
Andi M. R. Ilmy
Author Profile
Infall Syafalni
Author Profile
Rahmat Mulyawan
Author Profile
Trio Adiono
Author Profile

Abstract

This paper proposes a FAst paRAllel and pipeliNE Q-learning accelerator (FARANE-Q) for a configurable Reinforcement Learning (RL) algorithm that is implemented in a System on Chip (SoC). In order to overcome the challenges of a dynamic environment and increasing complexity, the proposed work offers flexibility, configurability, and scalability while maintaining computation speed and accuracy. The proposed method includes a HW/SW design methodology for the SoC architecture to achieve flexibility. Moreover, we also propose joint optimizations on algorithm, architecture and implementation in order to obtain optimum (high efficiency) performance, specifically in energy and area efficiency. Furthermore, we implemented the proposed design in a real-time Zynq Ultra96-V2 FPGA platform to evaluate the functionality with real use case of the smart navigation. Experimental results confirm that the proposed accelerator FARANE-Q outperforms state of the art works by achieving throughput up to 148.55 MSps. It corresponds to the energy efficiency of 1632.42 MSps/W per agent for 32-bit and 2465.42 MSps/W per agent for 16-bit FARANE-Q. Moreover, the proposed 16-bit FARANE-Q outperforms others in energy efficiency up to more than 2000×. The designed system also maintains the error accuracy less than 0.4% with optimized bit precision for more than 8 fraction bits. The proposed FARANE-Q also offers a speed up of processing time up to 1795× compared to embedded SW computation executed on ARM Zynq processor and 280× of computation of full software executed on i7 processor. Hence, the proposed work has the potential to be used for smart navigation, robotic control, and predictive maintenance.
2023Published in IEEE Access volume 11 on pages 144-161. 10.1109/ACCESS.2022.3232853