Model Analysis

Grok 4.20 (Non-Reasoning)

x-ai/grok-4.20

243.3

median tok/s

1999ms TTFT

100.0% success

Throughput Runs

TTFT Runs

Avg TTFT

1830ms

Avg Throughput

221.1 tok/s

Total Cost

$0.1248

Commentary

by openai/gpt-5.4-mini

Grok 4.20 (Non-Reasoning) is reliable on BridgeBench speed, with a 100.0% success rate and no prompt-level failures, but startup latency is fairly high: TTFT averages 1830 ms and the median is 1999 ms. Sustained decode performance is strong overall at 221.1 tok/s average and 243.3 tok/s median, with cost staying low at $0.124758, though throughput drops materially on longer essay-style outputs.

Api Designthroughput

This is the strongest throughput case, with a 260.9 tok/s median on ~2549 output tokens and no issues. The model sustains high decode speed on long, structured technical output without instability.

Data Structuresthroughput

Performance is solid and close to the overall median at 243.3 tok/s on ~2079 output tokens. This suggests stable sustained generation under moderate-length technical prompts.

Essaythroughput

This is the main throughput weakness, falling to 155.7 tok/s median on ~1903 output tokens. The drop indicates the model slows significantly on essay-style generation, likely due to longer-form reasoning and less structured output.

Definitionttft

TTFT is slightly better here at 1932 ms median, but still near 2 seconds even for a short 74-token response. Startup latency remains the main bottleneck rather than decode speed.

Factualttft

This is the slowest startup case at 2066 ms median TTFT despite only 15 output tokens. The short completion length makes the latency overhead especially visible and suggests weak first-token responsiveness.

Notable Prompts

Api Designthroughput

Highest sustained throughput at 260.9 tok/s with no issues, indicating strong long-form decode capacity.

Essaythroughput

Throughput drops to 155.7 tok/s, the largest degradation across prompt types.

Factualttft

2066 ms TTFT on a 15-token output is a poor startup profile and dominates end-to-end latency.

Data Structuresthroughput

Near-median throughput with no issues suggests good consistency on typical technical workloads.

All Runs

Prompt	Type	Tok/s	TTFT	Tokens	Cost
1. Api Design throughput-api-design	throughput	285.7	15272ms	2666	$0.0164
2. Api Design throughput-api-design	throughput	260.9	12835ms	2584	$0.0159
3. Api Design throughput-api-design	throughput	247.6	12544ms	2396	$0.0148
1. Data Structures throughput-data-structures	throughput	239.7	12199ms	2278	$0.0141
2. Data Structures throughput-data-structures	throughput	243.3	14971ms	1928	$0.0120
3. Data Structures throughput-data-structures	throughput	244.4	14677ms	2031	$0.0127
1. Essay throughput-essay	throughput	155.7	20704ms	1883	$0.0117
2. Essay throughput-essay	throughput	163.5	28055ms	1875	$0.0117
3. Essay throughput-essay	throughput	148.9	17930ms	1951	$0.0121
1. Definition ttft-definition	ttft	n/a	1501ms	68	$0.0007
2. Definition ttft-definition	ttft	n/a	2150ms	77	$0.0007
3. Definition ttft-definition	ttft	n/a	1932ms	77	$0.0007
1. Factual ttft-factual	ttft	n/a	2268ms	17	$0.0004
2. Factual ttft-factual	ttft	n/a	1061ms	11	$0.0003
3. Factual ttft-factual	ttft	n/a	2066ms	17	$0.0004

15 runs · Throughput rows require valid long-output runs · TTFT shown for all successful runs