loading page

Sound Events Localization and Detection Using Bio-inspired Gammatone Filters and Temporal Convolutional Neural Networks
  • Karen Rosero ,
  • Felipe Grijalva ,
  • Bruno Masiero
Karen Rosero
State University of Campinas

Corresponding Author:[email protected]

Author Profile
Felipe Grijalva
Author Profile
Bruno Masiero
Author Profile

Abstract

This manuscript addresses the problem of detecting, classifying, and localizing sound sources in an acoustic scene of spatial audio. We propose using bio-inspired Gammatone auditory filters for the acoustic feature extraction stage and a novel deep learning architecture encompassing convolutional, recurrent, and temporal convolutional blocks. Our system exceeded the state-of-the-art metrics on four spatial audio datasets with different levels of acoustical complexity and up to three sound sources overlapping in time. Furthermore, we also performed a comparative analysis of the gap between machine and human hearing, evidencing that our results have already exceeded the human performance in non-reverberant scenarios.
2023Published in IEEE/ACM Transactions on Audio, Speech, and Language Processing volume 31 on pages 2314-2324. 10.1109/TASLP.2023.3284525