loading page

When Speaker Recognition Meets Noisy Labels: Optimizations for Front-ends and Back-ends
  • Lin Li ,
  • Fuchuan Tong ,
  • Qingyang Hong
Fuchuan Tong
School of Information Science and Engineering, School of Information Science and Engineering

Corresponding Author:[email protected]

Author Profile
Qingyang Hong
Author Profile


A typical speaker recognition system often involves two modules: a feature extractor front-end and a speaker identity back-end. Despite the superior performance that deep neural networks have achieved for the front-end, their success benefits from the availability of large-scale and correctly labeled datasets. While label noise is unavoidable in speaker recognition datasets, both the front-end and back-end are affected by label noise, which degrades the speaker recognition performance. In this paper, we first conduct comprehensive experiments to help improve the understanding of the effects of label noise on both the front-end and back-end. Then, we propose a simple yet effective training paradigm and loss correction method to handle label noise for the front-end. We combine our proposed method with the recently proposed Bayesian estimation of PLDA for noisy labels, and the whole system shows strong robustness to label noise. Furthermore, we show two practical applications of the improved system: one application corrects noisy labels based on an utterance’s chunk-level predictions, and the other algorithmically filters out high-confidence noisy samples within a dataset. By applying the second application to the NIST SRE0410 dataset and verifying filtered utterances by human validation, we identify that approximately 1% of the SRE04-10 dataset is made up of label errors.
2022Published in IEEE/ACM Transactions on Audio, Speech, and Language Processing volume 30 on pages 1586-1599. 10.1109/TASLP.2022.3169977