TechRxiv
Kaliba_SWAHILILEX05092023Final2.pdf (355.2 kB)
Download file

Performance Assessment of a New Swahili Lexicon (SWAHILILex.01) Tagged by Native Speakers for Polarity Analysis

Download (355.2 kB)
preprint
posted on 2023-05-15, 12:38 authored by Aloyce KalibaAloyce Kaliba

This study tests the new Swahili Lexicon (SWAHILILex.01 ) annotated by native Swahili speakers for polarity analysis using pre-tagged datasets. The testing is against existing methods for polarity analysis that use lexicon-based methods, pre-trained models with transformers, and supervised machine-learning tools. The metrics for overall classification performance were accuracy, recall, precision, and F1-Score. The performance of the new SWAHILILex.01  lexicon was similar to the results of supervised machine learning and outperformed other methods when classifying the regular Swahili dataset but underperformed when classifying the tweets dataset. These preliminary results emphasize the need for domain-based Lexicons or new techniques that account for the multidomain experience common in social media data. The future research plan will expand SWAHILILex.01 to include other Swahili dialectics, extend polarity levels that focus on the emotional context, and create a pre-trained model for Swahili sentiment analysis for multidomain sentiment analysis.    

Funding

SAGE Foundation

History

Email Address of Submitting Author

aloyce_kaliba@subr.edu

Submitting Author's Institution

Southern University and A&M College

Submitting Author's Country

  • United States of America

Usage metrics

    Licence

    Exports