DIBERT.pdf (574.1 kB)
DIBERT: Dependency Injected Bidirectional Encoder Representations from Transformers
In this paper, we propose a new model named DIBERT which stands for Dependency Injected Bidirectional Encoder Representations from Transformers. DIBERT is a variation of the BERT and has an additional third objective called Parent Prediction (PP) apart from Masked Language Modeling (MLM) and Next Sentence Prediction (NSP). PP injects the syntactic structure of a dependency tree while pre-training the DIBERT which generates syntax-aware generic representations. We use the WikiText-103 benchmark dataset to pre-train both BERT- Base and DIBERT. After fine-tuning, we observe that DIBERT performs better than BERT-Base on various downstream tasks including Semantic Similarity, Natural Language Inference and Sentiment Analysis.
History
Email Address of Submitting Author
abdul.wahab@rwth-aachen.deORCID of Submitting Author
0000-0001-7813-3805Submitting Author's Institution
RWTH AachenSubmitting Author's Country
- Germany
Usage metrics
Categories
Keywords
nlpberttransformerdeep learningmachine learningnatural language processingattention modeldependency parse treedependency parsingpretrainingtrainingfinetuningsentiment analysisclassificationsemantic similaritynatural language inferencewiki-103masked language modelnext sentence predictionparent prediction