An ASIC Accelerator for QNN with Variable Precision and Tunable
Energy-Efficiency
Abstract
This paper presents TULIP, a new architecture for a variable precision
Quantized Neural Network (QNN) inference. It is designed with the goal
of maximizing energy efficiency per classification. TULIP is constructed
by arranging a collection of unique processing elements (TULIP-PEs) in a
single instruction multiple data (SIMD) fashion. Each TULIP-PE contains
binary neurons that are interconnected using multiplexers. Each neuron
also has a small dedicated local register connected to it. The binary
neurons are implemented as standard cells and used for implementing
threshold functions, i.e., an inner-product and thresholding operation
on its binary inputs. The neurons can be reconfigured with a single
change in the control signals to implement all the standard operations
used in a QNN.
This paper presents novel algorithms for implementing the operations of
a QNN on the TULIP-PEs in the form of a schedule of threshold functions.
TULIP was implemented as an ASIC in TSMC 40nm- LP technology. A QNN
accelerator that employs a conventional MAC-based arithmetic processor
was also implemented in the same technology to provide a fair
comparison. The results show that TULIP is 30-50X more energy-efficient
than an equivalent design, without any penalty in performance, area, or
accuracy. Furthermore, TULIP achieves these improvements without using
traditional techniques such as voltage scaling or approximate computing.
Finally, the paper also demonstrates how the run- time trade-off between
accuracy and energy efficiency is done on the TULIP architecture.