loading page

Harmonic-aware Tri-path Convolution Recurrent Network for Singing Voice Separation
  • YIH LIANG SHEN ,
  • YA CHING LAI ,
  • TAI SHIH CHI
YIH LIANG SHEN
Author Profile
YA CHING LAI
Author Profile
TAI SHIH CHI
Author Profile

Abstract

Temporal coherence and spectral regularity are critical cues for human auditory streaming processes and considered in many sound separation models. Some examples include the Conv-tasnet model, which focuses on temporal coherence using short length kernels to analyze sound, and the dual path convolution recurrent network (DPCRN) model, which uses two RNNs to analyze general patterns along the temporal and spectral dimensions on a spectrogram. In this paper, we propose a harmonic-aware tri-path convolution recurrent network (HATPCRN) model to separate singing voices on a spectrogram via the addition of an inter-band RNN, designed specifically for the harmonic structure. Evaluation results on the DSD100 and MUSDB18 datasets show that this addition can further boost separation performances of the DPCRN.
01 Jul 2023Published in JASA Express Letters volume 3 issue 7. 10.1121/10.0019997