A Decoupling Paradigm with Prompt Learning for Remote Sensing Image Change Captioning

Chenyang Liu; Rui Zhao; Jianqi Chen; Zipeng Qi; Zhengxia Zou; Zhenwei Shi

doi:10.36227/techrxiv.23269310.v2

loading page

A Decoupling Paradigm with Prompt Learning for Remote Sensing Image Change Captioning

Chenyang Liu ,
Rui Zhao ,
Jianqi Chen ,
Zipeng Qi ,
Zhengxia Zou ,
Zhenwei Shi

Abstract

Remote sensing image change captioning (RSICC) is a novel task that aims to describe the differences between bi-temporal images by natural language. Previous methods ignore a significant specificity of the task: the difficulty of RSICC is different for unchanged and changed image pairs. They process the unchanged and changed image pairs in a coupled way, which usually causes confusion for change captioning. In this paper, we decouple the task into two issues to ease it: whether and what changes have occurred. An image-level classifier performs binary classification to address the first issue. A feature-level encoder contributes to extracting discriminative features to help the caption generation module address the second issue. For caption generation, we utilize prompt learning to introduce pre-trained large language models (LLMs) into the RSICC task. A multi-prompt learning strategy is proposed to generate a set of unified prompts and a class-specific prompt conditioned on the image-level classifier’s results. It can prompt a pre-trained LLM to know whether changes exist and generate captions. Finally, the multiple prompts and the features of the feature-level encoder are fed into a frozen LLM for captioning. Compared with previous methods, our method can leverage the powerful abilities of the pre-trained LLM in language to generate plausible captions, which is free of training. Extensive experiments show that our method is effective and achieves state-of-the-art performance. Besides, an additional experiment demonstrates that our decoupling paradigm is more promising than the previous coupled paradigm for the RSICC task. We will make our codebase publicly available to facilitate future research at https://github.com/Chen-Yang-Liu/PromptCC

2023Published in IEEE Transactions on Geoscience and Remote Sensing volume 61 on pages 1-18. 10.1109/TGRS.2023.3321752

Abstract

Peer review status:Published