loading page

Evaluating the Cybersecurity Robustness of Commercial LLMs against Adversarial Prompts: A PromptBench Analysis
  • Takeshi Goto,
  • Kensuke Ono,
  • Akira Morita
Takeshi Goto

Corresponding Author:[email protected]

Author Profile
Kensuke Ono
Author Profile
Akira Morita
Author Profile


This study presents a comprehensive evaluation of the cybersecurity robustness of five leading Large Language Models (LLMs)-ChatGPT-4, Google Gemini, Anthropic Claude, Meta Llama, and Mistral 8x7B-against adversarial prompts using the PromptBench benchmark. Through a dual approach of quantitative and qualitative analysis, the research explores each model's performance, resilience, and vulnerabilities. Quantitative metrics such as accuracy, precision, recall, and F1 scores offer a statistical comparison across models, while qualitative insights reveal distinct patterns of response and susceptibility to various adversarial strategies. The findings highlight significant variations in model robustness, underlining the importance of a complex approach to enhancing LLM security. This study not only sheds light on current limitations but also emphasizes the need for advancing evaluation methodologies and model development practices to mitigate potential threats and ensure the safe deployment of LLMs in sensitive and critical applications.
05 Mar 2024Submitted to TechRxiv
06 Mar 2024Published in TechRxiv