A Transformer-based Network with Differential Feature Triple Refinement for Bitemporal Remote Sensing Image Change Detection

Hao Chang; Xian Sun; Peijin Wang; Wenhui Diao; Guangluan Xu

doi:10.36227/techrxiv.23119658.v1

loading page

A Transformer-based Network with Differential Feature Triple Refinement for Bitemporal Remote Sensing Image Change Detection

Hao Chang ,
Xian Sun ,
Peijin Wang ,
Wenhui Diao ,
Guangluan Xu

Abstract

In change detection (CD), how to reduce the interferences of pseudo changes and accurately recognize the change of interest (COI) are two important challenges. Recently, considering the powerful long-distance modeling ability of the transformer, some methods try to introduce the transformer into CD and have already proposed several useful CD strategies. However, the existing strategies either do not directly work on the change of interest (COI) or are difficult to give full play to the advantages of the transformer. Therefore, in this paper, we propose a new CD strategy to tackle the above challenges. Specifically, we focus on the difference domain and propose the differential feature triple refinement strategy to precisely characterize COI. We first adopt a CNN-based differential feature extraction (DFET) module to extract the possible detail differences between bitemporal images. Then, we introduce a transformer-based differential feature enhancement (DFEH) module to capture and enhance the COI regions from the preliminarily extracted differences. Finally, we utilize a CNN-based differential feature fusion (DFFS) module to integrate the fine-grained information into the enhanced COI regions. Based on the proposed strategy, we design a new network named DiFormer. We verify six effective hyperparameter configurations and conduct experiments on four commonly researched CD datasets. Extensive experiment results indicate that our proposed strategy has the outstanding generalization ability and obtains the better balance between computation costs and model performance. Peculiarly, when only adopting the Natural Scene Image Pretraining (NSIP), our method still exceeds the recently proposed CD methods which especially focus on the improvement of Remote Sensing Image Pretraining (RSIP).