BridgeBenchBridgeBench
Speed
Model Analysis

Grok 4

x-ai/grok-4

61.4

median tok/s

3684ms TTFT
100.0% success

Throughput Runs

9

TTFT Runs

6

Avg TTFT

3698ms

Avg Throughput

57.4 tok/s

Total Cost

$0.0000

Commentary

by openai/gpt-5.4-mini

Grok 4 is reliable on BridgeBench speed, with a 100.0% success rate and zero cost, but its startup latency is relatively high: median TTFT is 3684 ms and average TTFT is 3698 ms. Sustained decode performance is solid at 61.4 tok/s median throughput and 57.4 tok/s average throughput, with moderate variance across runs and a noticeable slowdown on longer essay-style outputs.

Api Designthroughput

This prompt is near the model's overall median at 61.4 tok/s with 2354 average output tokens, indicating stable sustained generation on long-form technical content. No issues were reported, so throughput is consistent under this workload.

Data Structuresthroughput

This is the fastest throughput case at 65.7 tok/s, suggesting the model handles structured technical exposition efficiently. The 2316-token average output is similar to API Design, so the higher rate is likely workload-dependent rather than due to shorter generations.

Essaythroughput

This is the weakest throughput prompt at 44.6 tok/s, a clear drop versus the other throughput tasks. The lower 1924-token average output still does not explain the gap, so this looks like a content-dependent slowdown on essay-style generation.

Definitionttft

TTFT is slower here at 4134 ms, which is the worst startup latency among the TTFT prompts. The very short 67-token output suggests the model's initial response latency is the main cost, not decode length.

Factualttft

This is the faster TTFT case at 3286 ms, but startup is still multi-second and not especially responsive. With only 20 average output tokens, the latency profile is dominated by prefill/startup rather than generation.

Notable Prompts

Data Structuresthroughput

Highest sustained throughput at 65.7 tok/s, indicating strong decode efficiency on structured technical content.

Essaythroughput

Lowest throughput at 44.6 tok/s, showing the model slows materially on essay-style long-form output.

Definitionttft

Worst startup latency at 4134 ms, so short-answer responsiveness is a clear weakness.

Factualttft

Fastest TTFT at 3286 ms, making it the least latent of the startup tests even though it remains slow in absolute terms.

All Runs

PromptTypeTok/sTTFT
1. Api Design
throughput-api-design
throughput61.310959ms
2. Api Design
throughput-api-design
throughput65.97525ms
3. Api Design
throughput-api-design
throughput61.410622ms
1. Data Structures
throughput-data-structures
throughput69.29444ms
2. Data Structures
throughput-data-structures
throughput65.710234ms
3. Data Structures
throughput-data-structures
throughput62.09380ms
1. Essay
throughput-essay
throughput44.69237ms
2. Essay
throughput-essay
throughput45.37485ms
3. Essay
throughput-essay
throughput41.28723ms
1. Definition
ttft-definition
ttftn/a4028ms
2. Definition
ttft-definition
ttftn/a4300ms
3. Definition
ttft-definition
ttftn/a4134ms
1. Factual
ttft-factual
ttftn/a3286ms
2. Factual
ttft-factual
ttftn/a3339ms
3. Factual
ttft-factual
ttftn/a3103ms

15 runs · Throughput rows require valid long-output runs · TTFT shown for all successful runs