loading page

DForest: A Minimal Dimensionality-Aware Indexing for High-Dimensional Exact Similarity Search
  • Lingli Li,
  • Wenjing Sun,
  • Baohua Wu
Lingli Li
Heilongjiang University

Corresponding Author:[email protected]

Author Profile
Wenjing Sun
Author Profile

Abstract

The problem of similarity search in high-dimensional space is a fundamental problem with numerous applications in computer science, but remains challenging due to the curse of dimensionality. To address this challenge, in this paper, DForest, a novel indexing approach, is proposed for both range and kNN queries on high-dimensional data. Unlike the previous similarity search approaches, which perform a dimensionality reduction on all objects by the same fixed value, we discover the minimal dimensionality each object requires to be under a loss threshold and attempt to reduce the dimensionality for each object individually. Furthermore, the query performance is also optimized by deriving the upper and lower bounds of retrieved blocks and calculating distances in a low-embedding space preferentially. Theoretical analysis is provided to support our search strategy. Extensive experiments demonstrate that DForest significantly outperforms all the state-of-the-art competitors in terms of query time, and exhibits good scalability.
14 Apr 2024Submitted to TechRxiv
18 Apr 2024Published in TechRxiv