self_refine_learning_for_data_centric_text_classification.pdf (307.98 kB)
Download file

Self-Refine Learning For Data-Centric Deep Learning

Download (307.98 kB)
posted on 2023-08-28, 19:43 authored by Tong GuoTong Guo

In industry NLP application, our manually labeled data has a certain number of noise data. We present a simple method to find the noise data and remove them. We select the noise data whose human label is not contained in the top-K model's predictions. The experiment result shows that our method works. For industry deep learning application, our method improve the text classification accuracy from 80.5% to 90.6% in dev dataset, and improve the human-evaluation accuracy from 83.2% to 90.1%.  The self-predict and self-drop method of this paper can not improve the accuracy to more than 95%, without human labeling again for the training dataset. 


Email Address of Submitting Author

Submitting Author's Institution


Submitting Author's Country

  • China

Usage metrics