Speech Emotion Recognition in Italian Using Wav2Vec 2.0 and the Novel Crowdsourced Emotional Speech Corpus Emozionalmente
Speech emotion recognition (SER) relies on speech corpora to collect emotional voices for analysis. However, emo- tions may vary by culture and language, and resources in Italian are scarce. To address this gap, we launched a crowdsourcing campaign and obtained Emozionalmente, a corpus of 6902 samples produced by 431 non-professional Italian speakers verbalizing 18 sentences expressing the Big Six emotions and neutrality. We conducted a subjective validation of Emozionalmente by asking 829 humans to guess the emotion expressed in the audio clips, achieving an overall accuracy of 66%. Additionally, we fine- tuned the deep learning wav2vec 2.0 model on Emozionalmente and achieved good performance, with an accuracy of around 81- 83%. In this paper, we describe the design choices, a descriptive analysis of the corpus, and the methodology and results of the behavioral and computational studies conducted on the dataset. Our work provides an alternative and extensive resource for linguistic and speech-processing research on the Italian language.
Email Address of Submitting Authorfabiocat@mit.edu
Submitting Author's InstitutionPolitecnico di Milano
Submitting Author's Country