TechRxiv
TSALP_22.pdf (9.14 MB)
Download file

CMGAN: Conformer-Based Metric-GAN for Monaural Speech Enhancement

Download (9.14 MB)
preprint
posted on 2022-10-10, 19:46 authored by Sherif AbdulatifSherif Abdulatif, Ruizhe Cao, Bin Yang

Convolution-augmented transformers (Conformers) are recently proposed in various speech-domain applications, such as automatic speech recognition (ASR) and speech separation, as they can capture both local and global dependencies. In this paper, we propose a conformer-based metric generative adversarial network (CMGAN) for speech enhancement (SE) in the time-frequency (TF) domain. The generator encodes the magnitude and complex spectrogram information using two-stage conformer blocks to model both time and frequency dependencies. The decoder then decouples the estimation into a magnitude mask decoder branch to filter out unwanted distortions and a complex refinement branch to further improve the magnitude estimation and implicitly enhance the phase information. Additionally, we include a metric discriminator to alleviate metric mismatch by optimizing the generator with respect to a corresponding evaluation score. Objective and subjective evaluations illustrate that CMGAN is able to show superior performance compared to state-of-the-art methods in three speech enhancement tasks (denoising, dereverberation and super-resolution). For instance, quantitative denoising analysis on Voice Bank+DEMAND dataset indicates that CMGAN outperforms various previous models with a margin, i.e., PESQ of 3.41 and SSNR of 11.10 dB. 

History

Email Address of Submitting Author

sherif.abdulatif@iss.uni-stuttgart.de

ORCID of Submitting Author

0000-0001-7498-3773

Submitting Author's Institution

Institute of Signal Processing and System Theory, University of Stuttgart

Submitting Author's Country

  • Germany