loading page

S2RNN: Self-Supervised Reconfigurable Neural Network Hardware Accelerator for Machine Learning Applications
  • Kasem Khalil ,
  • Bappaditya Dey ,
  • Magdy Bayoumi
Kasem Khalil
Author Profile
Bappaditya Dey
Author Profile
Magdy Bayoumi
Author Profile

Abstract

Neural networks have been commonly used in different domains. Hardware implementation of the neural networks is challenging to fit different applications’ requirements. This often necessitates realizing different neural network FPGA configurations from the very scratch specific to a given application. This paper proposes a flexible method that can be reconfigured in a self-supervised way to fit several ap?plications’ requirements, by providing only maximum available computational nodes a prior. The proposed method is based on reconfiguring the required number of hidden layers and hidden nodes of that layer based on a given application. Reconfigurability also allows to decide and update the number of active nodes, at layer (l+1), which receives computational results of l th layer nodes. A configuration block is used to send a signal to the control block which follows each node output. The control block decides which following nodes receive the output according to the corresponding signal value from the configuration block. Each node receives a signal to be enabled or disabled from the configuration block. Therefore, the number of inputs of each node and nodes in each layer is determined according to the desired performance. The goal here is to automatically propose the optimal NN configuration through reconfigurability to fit different applications, to achieve the maximum accuracy possible. The optimality can be demonstrated in terms of minimum average power, average delay, and area overhead as well as maximum throughput and accuracy. The main advantage of the proposed approach is the reusability of the optimized architecture (which is self-learned online) for the very first application/dataset, as the initial configuration for any new/next dataset/application rather than starting from scratch. We have demonstrated with experimental results, the proposed approach helps primarily to significantly reduce optimized architecture search cost (the number of online training iterations) as well as associated average power consumption for successive datasets/applications. Our proposed method demonstrates its effectiveness both quantitatively and qualitatively. The proposed method is verified against two different classification problems, MNIST and CIFAR-10. Our proposed reconfigurable method demonstrates stable accuracy of 98.97% and 98.95% against the state-of-the-art neural network with two different fixed configurations as (98.85% and 73.0%) for MNIST and (93.47% and 70.21%) for CIFAR-10, respectively. The proposed method also demonstrates a 20.9% reduction in average power dissipation against the state-of-the-art method. The proposed method is implemented and tested using VHDL and Altera FPGA. The results show resource utilization is comparable with the state-of-the-art method.