Abstract
Depression clinical interview corpora are essential for advancing
automated depression diagnosis. While previous studies have used written
speech material in controlled settings, these materials do not
accurately represent spontaneous conversational speech. Additionally,
self-reported measures of depression are subject to bias, making the
data unreliable for training models for real-world scenarios. This study
introduces a new corpus of depression clinical interviews collected
directly from a psychiatric hospital, containing 113 recordings with 52
healthy and 61 depressive patients. The subjects were examined using the
Montgomery-Asberg Depression Rating Scale (MADRS) in Chinese. Their
final diagnosis was based on medical evaluations through a clinical
interview conducted by a psychiatry specialist. All interviews were
audio-recorded and transcribed verbatim, and annotated by experienced
physicians. This dataset is a valuable resource for automated depression
detection research and is expected to advance the field of psychology.
Baseline models for detecting and predicting depression presence and
level were built, and descriptive statistics of audio and text features
were calculated. The decision-making process of the model was also
investigated and illustrated. To the best of our knowledge, this is the
first study to collect a depression clinical interview corpus in Chinese
and train machine learning models to diagnose depression patients.