Essential Maintenance: All Authorea-powered sites will be offline 9am-10am EDT Tuesday 28 May
and 11pm-1am EDT Tuesday 28-Wednesday 29 May. We apologise for any inconvenience.

loading page

Red Teaming for Multimodal Large Language Models: A Survey
  • +2
  • Moushumi Mahato,
  • Avinash Kumar,
  • Kartikey Singh,
  • Bhavesh Kukreja,
  • Javaid Nabi
Moushumi Mahato
Voice Intelligence R&D Samsung R&D Institute

Corresponding Author:[email protected]

Author Profile
Avinash Kumar
Voice Intelligence R&D Samsung R&D Institute
Kartikey Singh
Voice Intelligence R&D Samsung R&D Institute
Bhavesh Kukreja
Voice Intelligence R&D Samsung R&D Institute
Javaid Nabi
Voice Intelligence R&D Samsung R&D Institute

Abstract

As Generative AI becomes more prevalent, the vulnerability to security threats grows. This study conducts a thorough exploration of red teaming methods within the domain of Multimodal Large Language Models (MLLMs). Similar to adversarial attacks, red teaming involves tricking the model to generate unexpected outputs, revealing weaknesses that can be addressed through enhanced training for improved robustness. Through an extensive review of existing literature, this research categorizes and analyzes adversarial attacks, providing insights into their methodologies, targets and potential consequences. It further explores the evolving tactics employed to exploit vulnerabilities in various models, encompassing both traditional and deep learning architectures. The study also investigates the current state of defense mechanisms, examining countermeasures designed to thwart adversarial attacks. In addition to these aspects, the research conducts a meticulous analysis of red teaming methods with a specific focus on vulnerabilities related to images. By synthesizing insights from various studies and experiments, this survey aims to offer a comprehensive understanding of the multifaceted challenges posed by adversarial attacks in MLLMs. The outcomes of this research serve as a valuable resource for practitioners, researchers and policymakers seeking to fortify Generative AI systems against emerging security threats.
18 Jan 2024Submitted to TechRxiv
26 Jan 2024Published in TechRxiv