Self-Supervised Pre-Training of Transformers for Satellite Image Time
Series Classification
Abstract
Satellite image time series (SITS) classification is a major research
topic in remote sensing and is relevant for a wide range of
applications. Deep learning approaches have been commonly employed for
SITS classification and have provided state-of-the-art performance.
However, deep learning methods suffer from overfitting when labeled data
is scarce. To address this problem, we propose a novel self-supervised
pre-training scheme to initialize a Transformer-based network by
utilizing large-scale unlabeled data. In detail, the model is asked to
predict randomly contaminated observations given an entire time series
of a pixel. The main idea of our proposal is to leverage the inherent
temporal structure of satellite time series to learn general-purpose
spectral-temporal representations related to land cover semantics. Once
pre-training is completed, the pre-trained network can be further
adapted to various SITS classification tasks by fine-tuning all the
model parameters on small-scale task-related labeled data. In this way,
the general knowledge and representations about SITS can be transferred
to a label-scarce task, thereby improving the generalization performance
of the model as well as reducing the risk of overfitting. Comprehensive
experiments have been carried out on three benchmark datasets over large
study areas. Experimental results demonstrate the effectiveness of the
proposed method, leading to a classification accuracy increment up to
2.38% to 5.27%. The code and the pre-trained model will be available
at https://github.com/linlei1214/SITS-BERT upon publication.
This work has been submitted to the IEEE for possible
publication. Copyright may be transferred without notice, after which
this version may no longer be accessible.