DIBERT: Dependency Injected Bidirectional Encoder Representations from
Transformers
Abstract
In this paper, we propose a new model named DIBERT which stands for
Dependency Injected Bidirectional Encoder Representations from
Transformers. DIBERT is a variation of the BERT and has an additional
third objective called Parent Prediction (PP) apart from Masked Language
Modeling (MLM) and Next Sentence Prediction (NSP). PP injects the
syntactic structure of a dependency tree while pre-training the DIBERT
which generates syntax-aware generic representations. We use the
WikiText-103 benchmark dataset to pre-train both BERT- Base and DIBERT.
After fine-tuning, we observe that DIBERT performs better than BERT-Base
on various downstream tasks including Semantic Similarity, Natural
Language Inference and Sentiment Analysis.