loading page

AWLloss: Speaker Verification Based on the Quality and Difficulty of Speech
  • +2
  • Qian Liu ,
  • Xia Zhang ,
  • Xinyan Liang ,
  • Yuhua Qian ,
  • Shanshan Yao
Xia Zhang
Author Profile
Xinyan Liang
Author Profile
Yuhua Qian
Author Profile
Shanshan Yao
Author Profile

Abstract

Speaker verification is a natural and effective biometric authentication method. This method may be more practical when trained using data from real, unconstrained scenarios.  However, although the speech quality from real scenarios varies because of various adverse factors, including extreme noise and the lack of identity information, most models treat all speech samples equally. To tackle this deficiency, we propose adaptive weight loss (AWL), a function that assigns different weights to samples in accordance with their quality and difficulty based on our finding that speech quality is positively correlated with the L2 norm of speaker embedding. AWL directs the model’s attention to high-quality difficult speech samples and ignores low-quality difficult samples as much as possible so that the model can learn well from speech data containing a lot of noise. Comprehensive experiments reveal that AWL significantly outperforms existing loss functions on both utilized public and noisy datasets.
2023Published in IEEE Signal Processing Letters volume 30 on pages 1337-1341. 10.1109/LSP.2023.3314371