loading page

Benchmarking Large Language Models for Safe Software Development Advice
  • Robert Sloane-Anderson,
  • Harvey Price
Robert Sloane-Anderson

Corresponding Author:[email protected]

Author Profile
Harvey Price
Author Profile


Secure software development is essential for safeguarding systems against an increasing array of cyber threats. Evaluating the capabilities of advanced language models in providing secure coding advice is crucial for understanding their potential as reliable tools for developers. The study benchmarks ChatGPT and Google Gemini across metrics such as accuracy, relevance, comprehensiveness, and adherence to best practices, offering a detailed comparative analysis of their performance. Google Gemini demonstrated superior accuracy and adherence to secure coding standards, while ChatGPT excelled in generating contextually relevant and comprehensive responses. The findings highlight the strengths and weaknesses of each model, providing insights into their application in secure software development. The study's results inform developers on leveraging these models to enhance their coding practices, contributing to the creation of more secure and resilient software systems.
30 May 2024Submitted to TechRxiv
07 Jun 2024Published in TechRxiv