Abstract
Post-training quantization (PTQ) can reduce the memory footprint and
latency for deep model inference, while still preserving the accuracy of
the model, with only a small unlabeled calibration set and without the
retraining on full training set. To calibrate a quantized model, current
PTQ methods usually randomly select some unlabeled data from the
training set as calibration data. However, we prove that the random data
selection would result in performance instability and degradation for
the activation distribution mismatch. In this paper, we attempt to solve
the crucial task on optimal calibration data selection, and propose a
novel one-shot calibration data selection method termed SelectQ, which
selects specific data for calibration via dynamic clustering. SelectQ
uses the statistic information of activation and performs layer-wise
clustering to learn an activation distribution on training set. For that
purpose, a new metric called Knowledge Distance is proposed to calculate
the distances of activation statistics from centroids. Finally, after
calibration by the selected data, quantization noise can be alleviated
by mitigating the distribution mismatch within activations. Extensive
experiments on ImageNet dataset show that our SelectQ increases the
Top-1 accuracy of ResNet18 over 15\% in 4-bit
quantization, compared to randomly sampled calibration set. It’s
noteworthy that SelectQ does not involve both the backward propagation
and Batch Normalization parameters, which means that it has fewer
limitations in practical applications.