Abstract
Pretrained language models such as BERT, GPT have shown great
effectiveness in language understanding. The auxiliary predictive tasks
in existing pretraining approaches are mostly defined on tokens, thus
may not be able to capture sentence-level semantics very well. To
address this issue, we propose CERT: Contrastive self-supervised Encoder
Representations from Transformers, which pretrains language
representation models using contrastive self-supervised learning at the
sentence level. CERT creates augmentations of original sentences using
back-translation. Then it finetunes a pretrained language encoder (e.g.,
BERT) by predicting whether two augmented sentences originate from the
same sentence. CERT is simple to use and can be flexibly plugged into
any pretraining-finetuning NLP pipeline. We evaluate CERT on three
language understanding tasks: CoLA, RTE, and QNLI. CERT outperforms BERT
significantly.