loading page

DForest: A Minimal Dimensionality-Aware Indexing for High-Dimensional Exact Similarity Search
  • Lingli Li,
  • Wenjing Sun,
  • Baohua Wu
Lingli Li
Heilongjiang University

Corresponding Author:[email protected]

Author Profile
Wenjing Sun
Author Profile

Abstract

The problem of similarity search in highdimensional space is a fundamental problem with numerous applications in computer science, yet it remains challenging due to the curse of dimensionality. This paper introduces DForest, a novel indexing approach designed to address this challenge for both range and kNN queries on high-dimensional data. Unlike previous similarity search approaches that apply a fixed dimensionality reduction to all objects uniformly, our approach determines the minimal dimensionality required for each object within a specified loss threshold and then reduces the dimensionality for each object individually. Furthermore, the query performance is also optimized by deriving the upper and lower bounds of retrieved blocks and computing distances in a lowembedding space preferentially. Theoretical analysis is provided to support our search strategy. Extensive experiments on a variety of datasets verify the superiority of DForest over the state-of-theart methods.
14 Apr 2024Submitted to TechRxiv
18 Apr 2024Published in TechRxiv