Abstract
State-of-the-art Heterogeneous System on Chips (HMPSoCs) can perform
on-chip embedded inference on its CPU and GPU. Multi-component
pipelining is the method of choice to provide high-throughput
Convolutions Neural Network (CNN) inference on embedded platforms. In
this work, we provide details for the first CPU-GPU pipeline design for
CNN inference called Pipe-All. Pipe-All uses the ARM-CL library to
integrate an ARM big.Little CPU with an ARM Mali GPU. Pipe-All is the
first three-stage CNN inference pipeline design with ARM’s big CPU
cluster, Little CPU cluster, and Mali GPU as its stages. Pipe-All
provides on average 75.88% improvement in inference throughput (over
peak single-component inference) on Amlogic A311D HMPSoC in Khadas Vim 3
embedded platform. We also provide an open-source implementation for
Pipe-All.
This paper is submitted to IEEE Transactions on Computer-Aided Design of
Integrated Circuits and Systems (TCAD) as a transaction brief paper (5
pages).