Speech signal likability estimation through harmony between pitch and formant
Yuha Choi
Voice likability is a critical factor in machine-human interaction. However, studies on speech likability typically does not apply the harmony theory in music, which suggests general rules for pleasant sounds. In this paper, I propose a new method that estimates the likability of vocal signals using the harmonic relation of pitch and the first formant (F1). I extract the pitch and F1 from the vowel signal and compute the average cent value between notes in the musical scale from each pitch and F1. A small cent value indicates a consonant relation between pitch and F1. I compared the calculated cent values with the MOS test results from ten speech samples. The results showed a clear correlation between the subjective MOS scores and the consonance of pitch and F1 in vowels.