TechRxiv
neural_comb_filter.pdf (337.81 kB)

Neural Comb Filtering using Sliding Window Attention Network for Speech Enhancement

Download (337.81 kB)
preprint
posted on 27.07.2021, 05:11 by Venkatesh ParvathalaVenkatesh Parvathala, Sri Rama Murty Kodukula, Siva Ganesh Andhavarapu
In this paper, we demonstrate the significance of restoring harmonics of the fundamental frequency (pitch) in deep neural network (DNN) based speech enhancement. We propose a sliding-window attention network to regress the spectral magnitude mask (SMM) from the noisy speech signal. Even though the network parameters can be estimated by minimizing the mask loss, it does not restore the pitch harmonics, especially at higher frequencies. In this paper, we propose to restore the pitch harmonics in the spectral domain by minimizing cepstral loss around the pitch peak. The network parameters are estimated using a combination of the mask loss and cepstral loss. The proposed network architecture functions like an adaptive comb filter on voiced segments, and emphasizes the pitch harmonics in the speech spectrum. The proposed approach achieves comparable performance with the state-of-the-art methods with much lesser computational complexity.

History

Email Address of Submitting Author

parvathalavenkatesh123@gmail.com

ORCID of Submitting Author

0000-0001-6341-9480

Submitting Author's Institution

RGUKT-RK Valley

Submitting Author's Country

India