loading page

Pars-HAO: Hate Speech and Offensive Language Detection on Persian Social Media Using Ensemble Learning
  • +2
  • Mohammad Karami Sheykhlan ,
  • Jana Shafi ,
  • Saeed Kosari ,
  • Saleh Kheiri Abdoljabbar ,
  • Jaber Karimpour
Mohammad Karami Sheykhlan
University of Mohaghegh Ardabili, University of Mohaghegh Ardabili, University of Mohaghegh Ardabili

Corresponding Author:[email protected]

Author Profile
Jana Shafi
Author Profile
Saeed Kosari
Author Profile
Saleh Kheiri Abdoljabbar
Author Profile
Jaber Karimpour
Author Profile

Abstract

As social networks continue to gain widespread popularity, an urgent requirement arises to automatically identify and detect offensive language and hate speech. While there is a wealth of research and datasets available for English in this domain, there is currently a scarcity of research and datasets focused on identifying hate speech and offensive language in Persian text. This article introduces a 3-class dataset named Pars-HAO, consisting of 8013 tweets, to fill the gap in existing research. We collected the dataset by combining comments from pages that are more exposed to hate speech and using a keyword-based approach. Three annotators then labeled the tweets. In this study, we employed a combination of the Convolutional Neural Network (CNN) model and two widely recognized machine learning models, namely Support Vector Machine (SVM) and Logistic Regression (LR), as a baseline. To improve the classification performance, we employed the Hard Voting ensemble learning technique. Experimental results on the Pars-HAO dataset demonstrated that the Hard voting ensemble learning technique yielded the best outcome, achieving a macro F1-score of 68.76%.