Transfer Learning or Self-supervised Learning? A Tale of Two Pretraining
Paradigms
Abstract
Pretraining has become a standard technique in computer vision and
natural language processing, which usually helps to improve performance
substantially. Previously, the most dominant pretraining method is
transfer learning (TL), which uses labeled data to learn a good
representation network. Recently, a new pretraining approach –
self-supervised learning (SSL) – has demonstrated promising results on
a wide range of applications. SSL does not require annotated labels. It
is purely conducted on input data by solving auxiliary tasks defined on
the input data examples. The current reported results show that in
certain applications, SSL outperforms TL and the other way around in
other applications. There has not been a clear understanding on what
properties of data and tasks render one approach outperforms the other.
Without an informed guideline, ML researchers have to try both methods
to find out which one is better empirically. It is usually
time-consuming to do so. In this work, we aim to address this problem.
We perform a comprehensive comparative study between SSL and TL
regarding which one works better under different properties of data and
tasks, including domain difference between source and target tasks, the
amount of pretraining data, class imbalance in source data, and usage of
target data for additional pretraining, etc. The insights distilled from
our comparative studies can help ML researchers decide which method to
use based on the properties of their applications.