BridgeBenchBridgeBench

AI Coding Benchmark

Model Rankings

17 AI coding models tested across 130+ real-world tasks spanning algorithms, debugging, refactoring, generation, UI, and security — the categories that matter for vibe coding and agentic workflows.

Updated 2026-03-21

RankModelOverall
GPT-5.4
95.5
GPT-5.4 Mini
94.8
GPT-5.4 Nano
92.9
4GPT-4.1
91.8
5Qwen 3.5 35B-A3B
91.7
6Claude Sonnet 4.5
90.7
7Qwen 3.5 122B-A10B
90.0
8o3-mini
89.6
9Qwen 3.5 27B
89.5
10Gemini 2.5 Pro
88.9
11Qwen 3.5 Flash (02-23)
86.9
12Grok 4
86.2
13DeepSeek R1
84.5
14Qwen3 Coder 480B
82.7
15Llama 4 Maverick
80.4
16Qwen3.5 397B A17B
60.1
17Qwen3.5 Plus 2026-02-15
59.2

17models · Ranked by overall score · Category columns visible on wider screens