self_refine_learning_for_data_centric_text_classification.pdf (280.56 kB)
Download fileSelf-Refine Learning For Data-Centric Text Classification
In industry NLP application, our manually labeled data has
a certain number of noisy data. We present a simple method to find the
noisy data and re-label their labels to the result of model prediction.
We select the noisy data whose human label is not contained in the
top-K model’s predictions. The model is trained on the origin dataset.
The experiment result shows that our method works. For industry deep
learning application, our method improve the text classification accuracy
from 80.5% to 90.6% in dev dataset, and improve the human-evaluation
accuracy from 83.2% to 90.5%.
History
Email Address of Submitting Author
779222056@qq.comSubmitting Author's Institution
-Submitting Author's Country
- China