BERT_PLPS: A BERT-based Model for Predicting Lysine Phosphoglycerylation Sites
As one of the most important post-translational modification processes, lysine phosphoglycerylation modifications affect many important biosynthetic processes in the human body. However, traditional experimental methods for the recognization of lysine phosphoglycerylation sites are not only expensive but also time-consuming. Computational techniques may provide an economical and efficient way to predict lysine phosphoglycerylation sites. Therefore, it is extremely necessary and meaningful to study and establish prediction models with high accuracy. In the present study, we propose a BERT-based model, BERT PLPS, which could predict accurately lysine phosphoglycerylation sites. This model extracts amino acid sequence features with three algorithms: CKSAAP, AAC, and BE. Sample equalization is performed using the ADASYN and KNN algorithms. The data are dimensionalized by the ISOMap algorithm, and the features are encoded into feature sequences by an encoder as the input to a BERT-based prediction model. To learn better the intrinsic biological language of lysine, we replaced the original static mask with a dynamic random mask. Compared to other machine learning or deep learning-based models, BERT PLPS exhibits up to 99.53% accuracy and outperforms the most advanced model (PLP FS) with an increase of approximately 0.35% on ACC and approximately 0.93% on MCC.
Email Address of Submitting Author202000120172@mail.sdu.edu.cn
Submitting Author's InstitutionThe School of Information Science and Engineering, Shandong University, Qingdao 266237, China.
Submitting Author's Country