Fake Content

Sajjad Dadkhah; Xichen Zhang; Alexander Gerald Weismann; Amir Firouzi; Ali A. Ghorbani

doi:10.36227/techrxiv.22795130.v1

loading page

Fake Content

Sajjad Dadkhah ,
Xichen Zhang ,
Alexander Gerald Weismann ,
Amir Firouzi ,
Ali A. Ghorbani

Abstract

Automatic detection of fake content in social media such as Twitter is an enduring challenge. Technically, determining fake news on social media platforms is a straightforward binary classification problem. However, manually fact-checking even a small fraction of daily tweets would only be possible due to the sheer volume of daily tweets. To address this challenge, we crawled and crowdsourced one of the most extensive ground-truth datasets containing more than 180.000 labels from 2009 to 2022 for tweets with a 5-label and 3-label classification using Amazon Mechanical Turk. We utilized multiple levels of validation to ensure an accurate ground-truth benchmark dataset. We then created and implemented numerous machine learning and deep learning algorithms, such as different variations of BERT-based models, on the data to test the accuracy of real/fake tweet detection with both categories and determine which versions gave us the highest result metrics. Further analysis is performed on the dataset by explicitly utilizing the DBSCAN text clustering algorithm combined with the YAKE keyword creation algorithm to determine topics’ clustering and relationships. Finally, we analyzed each user in the dataset, determining their Bot Score, Credibility Score, and Influence Score for a better understanding of what type of Twitter user posts and their influence with each of their tweets, and if there were any underlying patterns to be drawn from each score concerning the truthfulness of the tweet. The experimental results illustrated profound improvement for models dealing with short-length text in solving a real-life problem, such as automatically detecting fake content in social media.

2023Published in IEEE Transactions on Computational Social Systems on pages 1-15. 10.1109/TCSS.2023.3322303

Abstract

Peer review status:Published