Multi-modal Contrastive Learning for Crop Classification Using Sentinel2
and Planetscope
Abstract
Remote sensing has enabled large-scale crop classification to understand
agricultural ecosystems and estimate production yields. Since a few
years, machine learning is increasingly used for automated crop
classification. However, in most approaches the novel algorithms are
applied to custom datasets containing information of few crop fields
covering a small region and this often leads to models that lack
generalization capability. In this work, we propose a multi-modal
contrastive self-supervised learning approach to obtain a pre-trained
model for crop-classification without the use of labeled data. Such
multi-modal self-supervised learning exploits the synergies of different
data sources to obtain a richer representation of the data. We build our
analysis by adapting the DENETHOR dataset developed for a part of
Eastern Germany to our usecase. We use the publicly available Sentinel2
and commercial Planetscope data. While Sentinel2 has higher spectral
resolution, Planetscope has finer spatial resolution. For an end-user
application, only one source is required. In this work, we analyze and
compare the performance of our multi-modal self-supervised model against
the uni-modal contrastive self-supervised model using the SCARF
algorithm. In addition, we also compare our multi-modal self-supervised
model with a supervised model. We find that our multi-modal pre-trained
model surpasses the uni-modal and supervised models in almost all test
cases.