loading page

A Multimodal Fusion Model for Depression Detection Assisted by Stacking DNNs
  • +2
  • Filipe Fontinele de Almeida,
  • Kelson Rômulo Teixeira Aires,
  • André Castelo Branco Soares,
  • Laurindo De Sousa Britto Neto,
  • Rodrigo De Melo Souza Veras
Filipe Fontinele de Almeida

Corresponding Author:[email protected]

Author Profile
Kelson Rômulo Teixeira Aires
André Castelo Branco Soares
Laurindo De Sousa Britto Neto
Rodrigo De Melo Souza Veras


Depression is a severe psychosocial pathology that causes mood changes, characterized by a strong feeling of hopelessness and deep sadness. In advanced stages, it can predispose patients to suicidal thoughts, highlighting the importance of finding methods that provide more accurate diagnoses. Traditional diagnosis relies on semi-structured interviews and complementary questionnaires. Combining these methods with careful data analysis that incorporates audiovisual and textual characteristics can obtain valuable clues about the presence of depression in individuals. Therefore, this study proposes a multimodal Ensemble Stacking Deep Neural Network model based on the analysis of facial expression characteristics, audio signals, and textual transcriptions to automatically detect depression. A comprehensive model was evaluated on the multimodal Distress Analysis Interview Corpus-Wizard of Oz dataset. We incorporated substantial volumes of data into the analysis and achieved a degree of separability greater than 0.9. Our results demonstrate both the effectiveness of the method and its superiority to other reference approaches.
06 Mar 2024Submitted to TechRxiv
11 Mar 2024Published in TechRxiv