Multi-Task Learning using Frequency-Warped Spectrum for Detecting Out-of-breath Speech
preprintposted on 09.04.2022, 11:46 by Sibasis SahooSibasis Sahoo, Samarendra Dandapat
This paper presents methods for detecting out-of-breath speech (OBS) under the shortness-of-breath condition due to physical load. It uses constant Q transform (CQT) for warping frequency spectrum non-linearly, which focuses on spectral saliencies of speech signal under the said condition. The existing works using deep neural networks (DNN) have spectrograms or prosodic features as input to detect the OBS speech. Two target labels, neutral and OBS, are commonly used for training DNN models. The labels, however, do not reflect the true exertion level in a speaker. In this work, a transfer-learning approach is proposed for estimating the physical exertion level. An open-source DNN model, which is trained on environmental sounds, is used for the same purpose. A multi-task learning (MTL) framework is applied for jointly learning the binary class labels and the exertion levels. The results show that the spectrogram with non-linear frequency warping performs better than its linear counterparts. Furthermore, with MTL, the classification performance F1-score improves by 3.54% and 12.67% over the single task learning (STL) using CQT-spectrogram and Mel-spectrogram, respectively.