Cross-modality Hierarchical Clustering and Refinement for Unsupervised
Visible-Infrared Person Re-Identification
Abstract
Visible-infrared person re-identification (VI-ReID) is a challenging
cross-modality image retrieval task. Compared to visible modality person
re-identification that handles only the intra-modality discrepancy,
VI-ReID suffers from an additional modality gap. Most existing VI-ReID
methods achieve promising accuracy in a supervised setting, but the high
annotation cost limits their scalability to real-world scenarios.
Although a few unsupervised VI-ReID methods already exist, they
typically rely on intra-modality initialization and cross-modality
instance selection, despite the additional computational time required
for intra-modality initialization. In this paper, we study the fully
unsupervised VI-ReID problem and propose a novel cross-modality
hierarchical clustering and refinement (CHCR) method by promoting
modality-invariant feature learning and improving the reliability of
pseudo-labels. Unlike conventional VI-ReID methods, CHCR does not rely
on any manual identity annotation and intra-modality initialization.
First, we design a simple and effective cross-modality clustering
baseline that clusters between modalities. Then, to provide sufficient
inter-modality positive sample pairs for modality-invariant feature
learning, we propose a cross-modality hierarchical clustering algorithm
to promote the clustering of inter-modality positive samples into the
same cluster. In addition, we develop an inter-channel pseudo-label
refinement algorithm to eliminate unreliable pseudo-labels by checking
the clustering results of three channels in the visible modality.
Extensive experiments demonstrate that CHCR outperforms state-of-the-art
unsupervised methods and achieves performance competitive with many
supervised methods.