Accelerating Convolutional Neural Network Using Discrete Orthogonal Transforms
All experiments are implemented in Python, using the PyTorch and the Torch-DCT libraries under the Google Colab environment. The Intel(R) Xeon(R) CPU @ 2.00GHz and a Tesla V100-SXM2-16GB GPU were assignment to the Google Colab runtime when profiling the DOT models. It should be noted that the current stable version of the PyTorch library, version 1.8.1, offers only the implementation of the FFT algorithm. Therefore, the implementations of the Hartley and Cosine transforms, listed in Table 1, are not implemented using the same optimizations (algorithm and code wise) adopted in the FFT. We benchmark the DOT methods using the LENET-5 network shown in Figure 10. The ReLU activation function is adopted a non-linear operation across the entire architecture. In this network, the convolutional operations have a kernel of size K = 5. The convolution is of type “valid”, i.e., padding is not applied to the input. Hence the output size M of each layer is smaller than its input size N, that is M=N−K+1. The optimizers used in our experiments are Adam, SGD, SGD with Momentum of 0.9, and RMSProp with α = 0.99. The StepLR scheduler is used with a step size of 20 epochs and a γ = 0.5. We train our model for 40 epochs using a mini-batch of size 128 and a learning rate of 0.001. Five datasets are used in order to benchmark the proposed DOT methods. Among them, we have the MNIST dataset and some variants of the MNIST dataset such as EMNIST, KMNIST and Fashion-MNIST. Additionally, a more complex dataset, CIFAR-10 is also used in our benchmark.