loading page

An Efficient CNN Inference Accelerator Based on Intra- and Inter-Channel Feature Map Compression
  • +2
  • Chenjia Xie ,
  • Zhuang Shao ,
  • Ning Zhao ,
  • Yuan Du ,
  • Li Du
Chenjia Xie
Author Profile
Zhuang Shao
Nanjing University

Corresponding Author:[email protected]

Author Profile
Ning Zhao
Author Profile

Abstract

Deep convolutional neural networks (CNNs) generate intensive inter-layer data during inference, which results in substantial on-chip memory size and off-chip bandwidth. To solve the memory constraint, this paper proposes an accelerator adopted with a compression technique that can reduce the inter-layer data by removing both intra- and inter-channel redundant information. Principal component analysis (PCA) is utilized in the compression process to concentrate inter-channel information. The spatial differences, truncation, and reconfigurable bit-width coding are implemented inside every feature map to eliminate the intra-channel data redundancy. Moreover, a particular data arrangement is introduced to enhance data continuity to optimize PCA analysis and improve compression performance. A CNN accelerator with the proposed compression technique is designed to support the on-the-fly compression process by pipelining the reconstruction, CNN computation, and compression operation. The prototype accelerator is implemented using 28-nm CMOS technology. It achieves 819.2GOPS peak throughput and 3.75TOPS/W energy efficiency with 218.5mW. Experiments show that the proposed compression technique achieves a compression ratio of 21.5%~43.0% (8-bit mode) and 9.8%~19.3% (16-bit mode) on state-of-the-art CNNs with a negligible accuracy loss.
Sep 2023Published in IEEE Transactions on Circuits and Systems I: Regular Papers volume 70 issue 9 on pages 3625-3638. 10.1109/TCSI.2023.3287602