loading page

Performance Analysis of DFT and FFT Algorithms on Modern GPUs
  • Venkata Salini Priyamvada Davuluru ,
  • Don Lahiru Nirmal Hettiarachchi ,
  • Eric Balster
Venkata Salini Priyamvada Davuluru
University of Dayton

Corresponding Author:[email protected]

Author Profile
Don Lahiru Nirmal Hettiarachchi
Author Profile
Eric Balster
Author Profile


Conventionally, the Fast Fourier Transform (FFT) has been adopted over the Discrete Fourier Transform (DFT) due to its faster execution. However, the emergence of modern high performance computing devices has favored the DFT algorithm due to its inherent parallelism. This letter explores a straightforward one-dimensional DFT whose performance is evaluated against the NVIDIA and AMD’s highly optimized FFT libraries, cuFFT and clFFT, respectively. Performance is analyzed in terms of average kernel execution time for data ranging from 21 to 225 samples across the NVIDIA GeForce GTX 1080 Ti and AMD Radeon RX 6800 XT graphical processors. The DFT algorithm achieves comparable results to the FFT routines for smaller input sizes whereas it significantly outperforms the FFT libraries for larger input lengths. The DFT shows an average performance increase of 177.7% over the cuFFT on the NVIDIA GeForce device using the CUDA toolkit. For the AMD Radeon graphical processor, the DFT outperforms the clFFT by an average of 184.2% using the OpenCL specification.