Artificial Vocal Learning guided by Phoneme Recognition and Visual Information