loading page

Automatic Speech Recognition for Nigerian-Accented English
  • Oreoluwa Babatunde
Oreoluwa Babatunde
Olabisi Onabanjo University

Corresponding Author:[email protected]

Author Profile

Abstract

Automatic Speech Recognition (ASR) systems have become ubiquitous in our daily lives, powering voice assistants and transcription services. However, these systems often overlook the diverse range of accents, including Nigerian-accented English, as they are primarily developed and trained on native English accents. This research aims to address this gap by developing a Nigerian-accented English ASR system. By creating ASR models capable of accurately interpreting and transcribing Nigerian-accented English, we strive to ensure equitable access to ASR technologies and services for individuals with Nigerian accents. The study employed transfer learning techniques on NeMo’s QuartzNet15x5 English model and Wav2vec2.0 XLS-R300M using Nigerian-accented data. NeMo QuartzNet15x5Base-En exhibited the fastest inference time of 0.156 seconds with a Word Error Rate (WER) of 8.2% on the test set and Wav2Vec2 XLS-R-300M achieved a WER of 14.9% on the test set with an inference time of 1.1 seconds. This work presents the NeMo QuartzNet15x5Base-En pretrain model as best for ASR modelling especially in a low-resource regime