TechRxiv
bare_jrnl_new_sample4.pdf (1.26 MB)
Download file

Multi-Task Learning using Frequency-Warped Spectrum for Detecting Out-of-breath Speech

Download (1.26 MB)
preprint
posted on 2022-04-09, 11:46 authored by Sibasis SahooSibasis Sahoo, Samarendra Dandapat
This paper presents methods for detecting out-of-breath speech (OBS) under the shortness-of-breath condition due to physical load. It uses constant Q transform (CQT) for warping frequency spectrum non-linearly, which focuses on spectral saliencies of speech signal under the said condition. The existing works using deep neural networks (DNN) have spectrograms or prosodic features as input to detect the OBS speech. Two target labels, neutral and OBS, are commonly used for training DNN models. The labels, however, do not reflect the true exertion level in a speaker. In this work, a transfer-learning approach is proposed for estimating the physical exertion level. An open-source DNN model, which is trained on environmental sounds, is used for the same purpose. A multi-task learning (MTL) framework is applied for jointly learning the binary class labels and the exertion levels. The results show that the spectrogram with non-linear frequency warping performs better than its linear counterparts. Furthermore, with MTL, the classification performance F1-score improves by 3.54% and 12.67% over the single task learning (STL) using CQT-spectrogram and Mel-spectrogram, respectively.

History

Email Address of Submitting Author

sibasis2016@iitg.ac.in

ORCID of Submitting Author

0000-0003-0118-2275

Submitting Author's Institution

Indian Institute of Technology Guwahati

Submitting Author's Country

  • India