BridgeBenchBridgeBench
Speed
Model Analysis

Grok 4.20 (Non-Reasoning)

x-ai/grok-4.20

243.3

median tok/s

1999ms TTFT
100.0% success

Throughput Runs

9

TTFT Runs

6

Avg TTFT

1830ms

Avg Throughput

221.1 tok/s

Total Cost

$0.1248

Commentary

by openai/gpt-5.4-mini

Grok 4.20 (Non-Reasoning) is reliable on BridgeBench speed, with a 100.0% success rate and no prompt-level failures, but startup latency is fairly high: TTFT averages 1830 ms and the median is 1999 ms. Sustained decode performance is strong overall at 221.1 tok/s average and 243.3 tok/s median, with cost staying low at $0.124758, though throughput drops materially on longer essay-style outputs.

Api Designthroughput

This is the strongest throughput case, with a 260.9 tok/s median on ~2549 output tokens and no issues. The model sustains high decode speed on long, structured technical output without instability.

Data Structuresthroughput

Performance is solid and close to the overall median at 243.3 tok/s on ~2079 output tokens. This suggests stable sustained generation under moderate-length technical prompts.

Essaythroughput

This is the main throughput weakness, falling to 155.7 tok/s median on ~1903 output tokens. The drop indicates the model slows significantly on essay-style generation, likely due to longer-form reasoning and less structured output.

Definitionttft

TTFT is slightly better here at 1932 ms median, but still near 2 seconds even for a short 74-token response. Startup latency remains the main bottleneck rather than decode speed.

Factualttft

This is the slowest startup case at 2066 ms median TTFT despite only 15 output tokens. The short completion length makes the latency overhead especially visible and suggests weak first-token responsiveness.

Notable Prompts

Api Designthroughput

Highest sustained throughput at 260.9 tok/s with no issues, indicating strong long-form decode capacity.

Essaythroughput

Throughput drops to 155.7 tok/s, the largest degradation across prompt types.

Factualttft

2066 ms TTFT on a 15-token output is a poor startup profile and dominates end-to-end latency.

Data Structuresthroughput

Near-median throughput with no issues suggests good consistency on typical technical workloads.

All Runs

PromptTypeTok/sTTFT
1. Api Design
throughput-api-design
throughput285.715272ms
2. Api Design
throughput-api-design
throughput260.912835ms
3. Api Design
throughput-api-design
throughput247.612544ms
1. Data Structures
throughput-data-structures
throughput239.712199ms
2. Data Structures
throughput-data-structures
throughput243.314971ms
3. Data Structures
throughput-data-structures
throughput244.414677ms
1. Essay
throughput-essay
throughput155.720704ms
2. Essay
throughput-essay
throughput163.528055ms
3. Essay
throughput-essay
throughput148.917930ms
1. Definition
ttft-definition
ttftn/a1501ms
2. Definition
ttft-definition
ttftn/a2150ms
3. Definition
ttft-definition
ttftn/a1932ms
1. Factual
ttft-factual
ttftn/a2268ms
2. Factual
ttft-factual
ttftn/a1061ms
3. Factual
ttft-factual
ttftn/a2066ms

15 runs · Throughput rows require valid long-output runs · TTFT shown for all successful runs