loading page

Multimodal Forgery Detection Using Ensemble Learning
  • +3
  • Ammarah Hashmi,
  • Sahibzada Adil Shahzad,
  • Wasim Ahmad,
  • Chia Wen Lin,
  • Yu Tsao,
  • Hsin-Min Wang
Ammarah Hashmi
Institute of Information Systems and Applications, National Tsing Hua University, International Graduate Program, Institute of Information Science, Social Networks and Human Centered Computing Program, Academia Sinica

Corresponding Author:[email protected]

Author Profile
Sahibzada Adil Shahzad
Department of Computer Science, National Chengchi University, International Graduate Program, Institute of Information Science, Social Networks and Human Centered Computing Program, Academia Sinica
Wasim Ahmad
Department of Computer Science, National Chengchi University, International Graduate Program, Institute of Information Science, Social Networks and Human Centered Computing Program, Academia Sinica
Chia Wen Lin
Department of Electrical Engineering, National Tsing Hua University
Yu Tsao
Research Center for Information Technology Innovation, Academia Sinica
Hsin-Min Wang
International Graduate Program, Institute of Information Science, Social Networks and Human Centered Computing Program, Academia Sinica

Abstract

The recent rapid revolution in Artificial Intelligence (AI) technology has enabled the creation of hyper-realistic deepfakes, and detecting deepfake videos (also known as AIsynthesized videos) has become a critical task. The existing systems generally do not fully consider the unified processing of audio and video data, so there is still room for further improvement. In this paper, we focus on the multimodal forgery detection task and propose a deep forgery detection method based on audiovisual ensemble learning. The proposed method consists of four parts, namely a Video Network, an Audio Network, an Audiovisual Network, and a Voting Module. Given a video, the proposed multimodal and ensemble learning system can identify whether it is fake or real. Experimental results on a recently released multimodal FakeAVCeleb dataset show that the proposed method achieves 89% accuracy, significantly outperforming existing models.
14 May 2024Submitted to TechRxiv
20 May 2024Published in TechRxiv