loading page

Ferroelectric FET Based Time-Mode Multiply-Accumulate Accelerator: Design and Analysis
  • +2
  • Tanveer kaur ,
  • Musaib Rafiq ,
  • Amol Gaidhane ,
  • Yogesh Singh Chauhan ,
  • Shubham Sahay
Tanveer kaur
Author Profile
Musaib Rafiq
Author Profile
Amol Gaidhane
Author Profile
Yogesh Singh Chauhan
Author Profile
Shubham Sahay
IIT Kanpur

Corresponding Author:[email protected]

Author Profile

Abstract

General-purpose multiply-accumulate (MAC) accelerators have become inevitable in the IoT edge devices for performing computationally intensive tasks such as deep learning, signal processing, combinatorial optimization, etc. The throughput and the energy-efficiency of the conventional digital processors and MAC accelerators are limited due to their sparse design owing to the von-Neumann architecture. Although mixed-signal time-mode MAC accelerators utilizing emerging non-volatile memories appear promising owing to their ability to perform in-memory MAC operation via the physical laws, their application is limited due to their incompatibility and complex integration with the CMOS-process, high sensitivity to process variations, large operating voltage/cell currents, etc. To mitigate these issues, in this work, we propose a time-mode MAC accelerator based on ferroelectric-FinFETs with CMOS-compatible doped-HfO2 in the gate stack. Our rigorous analysis reveals a trade-off between the performance metrics such as computational precision, area- and energy-efficiency of the proposed MAC accelerator. Therefore, we provide the necessary design guidelines to further optimize the performance. Extensive design space exploration and simulations exploiting an experimentally calibrated compact model for the doped HfO2 ferroelectric capacitor along with 7 nm-technology PDK from ARM (ASAP) indicates that the proposed MAC accelerator exhibits a record energy-efficiency of 2.612 PetaOperations/Joule , a considerably high area-efficiency of 88.5 bits/µm2 (including I/O peripheral circuitry) , and a throughput of 4.6 TeraOps/s while supporting a 4-bit MAC operation for a square weight matrix of size 200×200 which is sufficient for realistic inference tasks.