Abstract
Single-channel speech enhancement algorithms have seen great
improvements over the past few years. Despite these improvements, they
still lack the efficiency of the auditory system in extracting attended
auditory information in the presence of competing speakers. Recently, it
has been shown that the attended auditory information can be decoded
from the brain activity of the listener. In this paper, we propose two
novel deep learning methods referred to as the Brain Enhanced Speech
Denoiser (BESD) and the U-shaped Brain Enhanced Speech Denoiser (U-BESD)
respectively, that take advantage of this fact to denoise a multi-talker
speech mixture. We use a Feature-wise Linear Modulation (FiLM) between
the brain activity and the sound mixture, to better extract the features
of the attended speaker to perform speech enhancement. We show, using
electroencephalography (EEG) signals recorded from the listener, that
U-BESD outperforms a current autoencoder approach in enhancing a speech
mixture as well as a speech separation approach that uses brain
activity. Moreover, we show that both BESD and U-BESD successfully
extract the attended speaker without any prior information about this
speaker. This makes both algorithms great candidates for realistic
applications where no prior information about the attended speaker is
available, such as hearing aids, cellphones, or noise cancelling
headphones. All procedures were performed in accordance with the
Declaration of Helsinki and were approved by the Ethics Committees of
the School of Psychology at Trinity College Dublin, and the Health
Sciences Faculty at Trinity College Dublin.