DForest: A Minimal Dimensionality-Aware Indexing for High-Dimensional Exact Similarity Search
The problem of similarity search in high-dimensional space is a fundamental problem with numerous applications in computer science, but remains challenging due to the curse of dimensionality. To address this challenge, in this paper, DForest, a novel indexing approach, is proposed for both range and kNN queries on high-dimensional data. Unlike the previous similarity search approaches, which perform a dimensionality reduction on all objects by the same fixed value, we discover the minimal dimensionality each object requires to be under a loss threshold and attempt to reduce the dimensionality for each object individually. Furthermore, the query performance is also optimized by deriving the upper and lower bounds of retrieved blocks and calculating distances in a low-embedding space preferentially. Theoretical analysis is provided to support our search strategy. Extensive experiments demonstrate that DForest significantly outperforms all the state-of-the-art competitors in terms of query time, and exhibits good scalability.
Email Address of Submitting Authorlilingli@hlju.edu.cn
ORCID of Submitting Author0000-0001-8898-5817
Submitting Author's InstitutionHeilongjiang University
Submitting Author's Country