TechRxiv
Gelderblom2022_1.pdf (1.14 MB)
Download file

On the Predictive Power of Objective Intelligibility Metrics for the Subjective Performance of Deep Complex Convolutional Recurrent Speech Enhancement Networks

Download (1.14 MB)
preprint
posted on 04.05.2022, 21:38 authored by Femke B. GelderblomFemke B. Gelderblom, Tron V. Tronstad, Torbjørn Svendsen, Tor Andre Myrvoll
Speech enhancement (SE) systems aim to improve the quality and intelligibility of degraded speech signals obtained from far-field microphones. Subjective evaluation of the intelligibility performance of these SE systems is uncommon. Instead, objective intelligibility measures (OIMs) are generally used to predict subjective performance increases. Many recent deep learning based SE systems, are expected to improve the intelligibility of degraded speech as measured by OIMs.

However, validation of the OIMs for this purpose is lacking. Therefore, in this study, we evaluate the predictive performance of five popular OIMs. We compare the metrics' predictions with subjective results. For this purpose, we recruited 50 human listeners, and subjectively tested both single channel and multi-channel Deep Complex Convolutional Recurrent Network (DCCRN) based speech systems.

We find that none of the OIMs gave reliable predictions, and that all OIMs overestimated the intelligibility of `enhanced' speech signals.

History

Email Address of Submitting Author

Femke.Gelderblom@sintef.no

ORCID of Submitting Author

0000-0002-1034-4427

Submitting Author's Institution

SINTEF & NTNU

Submitting Author's Country

Norway