FARANE-Q: Fast Parallel and Pipeline Q-Learning Accelerator for
Configurable Reinforcement Learning SoC
Abstract
This paper proposes a FAst paRAllel and pipeliNE Q-learning accelerator
(FARANE-Q) for a configurable Reinforcement Learning (RL) algorithm
implemented in a System on Chip (SoC). The proposed work offers
flexibility, configurability, and scalability while maintaining
computation speed and accuracy to overcome the challenges of a dynamic
environment and increasing complexity. The proposed method includes a
Hardware/Software (HW/SW) design methodology for the SoC architecture to
achieve flexibility. We also propose joint optimizations on the
algorithm, architecture, and implementation to obtain optimum (high
efficiency) performance, specifically in energy and area efficiency.
Furthermore, we implemented the proposed design in a real-time Zynq
Ultra96-V2 FPGA platform to evaluate the functionality with an actual
use case of smart navigation. Experimental results confirm that the
proposed accelerator FARANE-Q outperforms state-of-the-art works by
achieving a throughput of up to 148.55 MSps. It corresponds to the
energy efficiency of 1747.64 MSps/W per agent for 32-bit and 2424.33
MSps/W per agent for 16-bit FARANE-Q. Moreover, the proposed 16-bit
FARANE-Q outperforms other related works by an improvement of at least
1.23× in energy efficiency. The designed system also maintains an error
accuracy of less than 0.4% with optimized bit precision for more than
eight fraction bits. The proposed FARANE-Q also offers a speed up of
processing time up to 1795× compared to embedded SW computation executed
on ARM Zynq processor and 280× of computation of full software executed
on i7 processor. Hence, the proposed work has the potential to be used
for smart navigation, robotic control, and predictive maintenance.