TechRxiv
A_Decoupling_Paradigm_with_Prompt_Learning_for_Remote_Sensing_Image_Change_Captioning.pdf (4.98 MB)

A Decoupling Paradigm with Prompt Learning for Remote Sensing Image Change Captioning

Download (4.98 MB)
preprint
posted on 2023-06-07, 02:21 authored by Chenyang LiuChenyang Liu, Rui Zhao, Jianqi Chen, Zipeng Qi, Zhengxia Zou, Zhenwei Shi

Remote sensing image change captioning (RSICC) is a novel task that aims to describe the differences between bi-temporal images by natural language. Previous methods ignore a significant specificity of the task: the difficulty of RSICC is different for unchanged and changed image pairs. They process the unchanged and changed image pairs in a coupled way, which usually causes confusion for change captioning. In this paper, we decouple the task into two issues to ease it: whether and what changes have occurred. An image-level classifier performs binary classification to address the first issue. A feature-level encoder contributes to extracting discriminative features to help the caption generation module address the second issue. For caption generation, we utilize prompt learning to introduce pre-trained large language models (LLMs) into the RSICC task. A multi-prompt learning strategy is proposed to generate a set of unified prompts and a class-specific prompt conditioned on the image-level classifier's results. It can prompt a pre-trained LLM to know whether changes exist and generate captions. Finally, the multiple prompts and the features of the feature-level encoder are fed into a frozen LLM for captioning. Compared with previous methods, our method can leverage the powerful abilities of the pre-trained LLM in language to generate plausible captions, which is free of training. Extensive experiments show that our method is effective and achieves state-of-the-art performance. Besides, an additional experiment demonstrates that our decoupling paradigm is more promising than the previous coupled paradigm for the RSICC task.

History

Email Address of Submitting Author

liuchenyang@buaa.edu.cn

ORCID of Submitting Author

0000-0003-3034-6646

Submitting Author's Institution

the Image Processing Center, School of Astronautics, Beihang University

Submitting Author's Country

  • China

Usage metrics

    Categories

    Exports