Abstract
Pansharpening is the task of creating a High-Resolution Multi-Spectral
Image (HRMS) by extracting and infusing pixel details from the
High-Resolution Panchromatic Image into the Low-Resolution
Multi-Spectral (LRMS). With the boom in the amount of satellite image
data, researchers have replaced traditional approaches with deep
learning models. However, existing deep learning models are not built to
capture intricate pixel-level relationships. Motivated by the recent
success of self-attention mechanisms in computer vision tasks, we
propose Pansformers, a transformer-based self-attention architecture,
that computes band-wise attention. A further improvement is proposed in
the attention network by introducing a Multi-Patch Attention mechanism,
which operates on non-overlapping, local patches of the image. Our model
is successful in infusing relevant local details from the Panchromatic
image while preserving the spectral integrity of the MS image. We show
that our Pansformer model significantly improves the performance metrics
and the output image quality on imagery from two satellite distributions
IKONOS and LANDSAT-8.