loading page

BERT_PLPS: A BERT-based Model for Predicting Lysine Phosphoglycerylation Sites
  • +1
  • Songning Lai ,
  • Pengwei Wang ,
  • Lan Ye ,
  • Zhi Liu
Songning Lai
The School of Information Science and Engineering

Corresponding Author:[email protected]

Author Profile
Pengwei Wang
Author Profile

Abstract

As one of the most important post-translational modification processes, lysine phosphoglycerylation modifications affect many important biosynthetic processes in the human body. However, traditional experimental methods for the recognization of lysine phosphoglycerylation sites are not only expensive but also time-consuming. Computational techniques may provide an economical and efficient way to predict lysine phosphoglycerylation sites. Therefore, it is extremely necessary and meaningful to study and establish prediction models with high accuracy. In the present study, we propose a BERT-based model, BERT PLPS, which could predict accurately lysine phosphoglycerylation sites. This model extracts amino acid sequence features with three algorithms: CKSAAP, AAC, and BE. Sample equalization is performed using the ADASYN and KNN algorithms. The data are dimensionalized by the ISOMap algorithm, and the features are encoded into feature sequences by an encoder as the input to a BERT-based prediction model. To learn better the intrinsic biological language of lysine, we replaced the original static mask with a dynamic random mask. Compared to other machine learning or deep learning-based models, BERT PLPS exhibits up to 99.53% accuracy and outperforms the most advanced model (PLP FS) with an increase of approximately 0.35% on ACC and approximately 0.93% on MCC.