On the Predictive Power of Objective Intelligibility Metrics for the
Subjective Performance of Deep Complex Convolutional Recurrent Speech
Enhancement Networks
Abstract
Speech enhancement (SE) systems aim to improve the quality and
intelligibility of degraded speech signals obtained from far-field
microphones. Subjective evaluation of the intelligibility performance of
these SE systems is uncommon. Instead, objective intelligibility
measures (OIMs) are generally used to predict subjective performance
increases. Many recent deep learning based SE systems, are expected to
improve the intelligibility of degraded speech as measured by OIMs.
However, validation of the OIMs for this purpose is lacking. Therefore,
in this study, we evaluate the predictive performance of five popular
OIMs. We compare the metrics’ predictions with subjective results. For
this purpose, we recruited 50 human listeners, and subjectively tested
both single channel and multi-channel Deep Complex Convolutional
Recurrent Network (DCCRN) based speech systems.
We find that none of the OIMs gave reliable predictions, and that all
OIMs overestimated the intelligibility of ‘enhanced’ speech signals.