The GenAI Code Security Report has been updated with rigorous October 2025 testing.
Using our established 80-task benchmark, we evaluated the latest LLM releases for secure coding performance. The findings are direct:
- OpenAI Pulls Ahead: GPT-5 Mini delivered a 72% security pass rate, while GPT-5 followed at 70%. These are the highest scores recorded to date.
- Others Stagnate: Models from Anthropic, Google, Qwen, and xAI remain in the 50–59% range, with several showing slight declines from previous versions.
- Reasoning Matters: Models engineered for internal reasoning—like GPT-5's variants—consistently outperformed standard models, confirming that structured reasoning provides a tangible security advantage.
Even with these advances, vulnerabilities persist in 28% of cases—reminding us that AI-generated code requires validation and layered security controls.
Read the full report for data-driven guidance on choosing safer AI code assistants and mitigating risk.