loading page

Vectorized Highly Parallel Density-based Clustering for Applications with Noise
  • +5
  • Joseph Xavier Arnold,
  • Juan Pedro Gutiérrez Hermosillo Muriedas,
  • Stepan Nassyr,
  • Rocco Sedona,
  • Markus Götz,
  • Achim Streit,
  • Morris Riedel,
  • Gabriele Cavallaro
Joseph Xavier Arnold
Juan Pedro Gutiérrez Hermosillo Muriedas
Stepan Nassyr
Rocco Sedona
Markus Götz
Achim Streit
Morris Riedel
Gabriele Cavallaro

Corresponding Author:[email protected]

Author Profile

Abstract

Clustering in data mining involves grouping similar objects into categories based on their characteristics. As the volume of data continues to grow and advancements in highperformance computing evolve, a critical need has emerged for algorithms that can efficiently process these computations and exploit the various levels of parallelism offered by modern supercomputing systems. Exploiting Single Instruction Multiple Data (SIMD) instructions enhances parallelism at the instruction level and minimizes data movement within the memory hierarchy. To fully harness a processor's SIMD capabilities and achieve optimal performance, adapting algorithms for better compatibility with vector operations is necessary. In this paper, we introduce a vectorized implementation of the Density-based Clustering for Applications with Noise (DBSCAN) algorithm suitable for the execution on both shared and distributed memory systems. By leveraging SIMD, we enhance the performance of distance computations. Our proposed Vectorized HPDBSCAN (VHPDBSCAN) demonstrates a performance improvement of up to two times over the state-of-the-art parallel version, Highly Parallel DBSCAN (HPDBSCAN), on the ARM-based A64FX processor on two different datasets with varying dimensions. Additionally, we evaluate VHPDBSCAN's energy consumption on the A64FX and Intel Xeon processors. The results show that the proposed implementation reduces energy consumption by a factor of two on the A64FX Central Processing Unit (CPU) and by approximately 19.5% on the Intel Xeon 8368 CPU compared to previous methods.
13 Mar 2024Submitted to TechRxiv
19 Mar 2024Published in TechRxiv