loading page

Performance Assessment of a New Swahili Lexicon (SWAHILILex
  • Aloyce Kaliba
Aloyce Kaliba
Southern University and A&M College

Corresponding Author:[email protected]

Author Profile


This study tests the new Swahili Lexicon (SWAHILILex.01 ) annotated by native Swahili speakers for polarity analysis using pre-tagged datasets. The testing is against existing methods for polarity analysis that use lexicon-based methods, pre-trained models with transformers, and supervised machine-learning tools. The metrics for overall classification performance were accuracy, recall, precision, and F1-Score. The performance of the new SWAHILILex.01  lexicon was similar to the results of supervised machine learning and outperformed other methods when classifying the regular Swahili dataset but underperformed when classifying the tweets dataset. These preliminary results emphasize the need for domain-based Lexicons or new techniques that account for the multidomain experience common in social media data. The future research plan will expand SWAHILILex.01 to include other Swahili dialectics, extend polarity levels that focus on the emotional context, and create a pre-trained model for Swahili sentiment analysis for multidomain sentiment analysis.