loading page

A Multi-Scale Complex Feature Refinement and Dynamic Convolution Attention-Aware Network for Depression and ADHD Assessment Using Speech
  • Shuanglin Li
Shuanglin Li

Corresponding Author:[email protected]

Author Profile


In the area of affective computing, speech has been identified as a promising biomarker for assessing depression and attention deficit hyperactivity disorder (ADHD). These disorders manifest as abnormalities in speech across various frequency bands and exhibit temporal variations. Most existing work on speech features relies on the magnitude spectrogram, which discards phase spectrogram information and also does not consider the impact of different frequency bands on depression and ADHD detection. Inspired by these, we propose a novel multi-scale complex feature refinement and a dynamic attention-aware network for enhanced speech-based assessment of depression and ADHD. Our approach incorporates three key components: multi-scale complex feature refinement (MSFR), dynamic convolutional neural network (Dy-CNN), and dual-attention feature enhancement (DAFE) module. The MSFR module utilizes depth-wise convolutional networks to process both magnitude and phase information, selectively emphasizing frequency bands associated with depression and ADHD. Importantly, the Dy-CNN module employs an attention mechanism to autonomously generate multiple convolution kernels that adapt to input features and capture relevant temporal dynamics linked to depression and ADHD. Additionally, the DAFE module includes channel shuffle attention (CSA) and spatial axial attention (SAA) mechanisms to improve the feature representation ability and detection performance by leveraging inter- and intra-channel relationships to examine the time-frequency characteristics of the feature map. Extensive experiments conducted on four publicly available datasets, i.e. AVEC 2013, AVEC 2014, E-DAIC and a self-collected authentic ADHD dataset demonstrated the effectiveness of the proposed method over previous approaches and showed superior generalization capabilities across different languages for depression and ADHD assessment.

24 Apr 2024Submitted to TechRxiv
29 Apr 2024Published in TechRxiv