BridgeBenchBridgeBench
Speed
Model Analysis

GPT-5.4

openai/gpt-5.4

88.0

median tok/s

397ms TTFT
100.0% success

Throughput Runs

9

TTFT Runs

6

Avg TTFT

743ms

Avg Throughput

77.1 tok/s

Total Cost

$0.3004

Commentary

by openai/gpt-5.4-mini

GPT-5.4 is reliable on BridgeBench speed, with a 100.0% success rate and no prompt-specific failures. Startup latency is solid at 397 ms median TTFT, but average TTFT rises to 743 ms, suggesting some long-tail cold-start or queueing variance. Sustained decode is strong overall at 88.0 tok/s median throughput, though the average drops to 77.1 tok/s and the technical essay workload pulls performance down materially; total cost is moderate at $0.300380.

Api Designthroughput

This is the strongest throughput case at 92.7 tok/s median with no issues, indicating good sustained generation on structured, mid-length outputs. The 3,410-token average output did not materially degrade speed.

Data Structuresthroughput

Throughput is stable at 88.0 tok/s median, essentially matching the overall median and indicating consistent decode behavior on technical content. Output length is slightly shorter than API Design, but there is no sign of instability.

Essaythroughput

This is the main throughput weakness at 57.0 tok/s median, a large drop versus the other throughput prompts. The 3,571-token average output suggests longer-form prose is more expensive for this model and likely drives the lower overall average throughput.

Definitionttft

TTFT is slightly slower here at 406 ms median, but still in a good range for interactive use. The very small 58-token outputs keep the latency profile focused on startup rather than decode.

Factualttft

This is the fastest startup case at 387 ms median TTFT, indicating low first-token latency on short factual responses. The 12-token average output is tiny, so this prompt is a clean read on initiation speed.

Notable Prompts

Essaythroughput

It is the clear outlier, with median throughput far below the other throughput prompts and dragging down the average.

Api Designthroughput

It has the best sustained decode rate and remains stable on long structured outputs.

Factualttft

It has the lowest median TTFT, indicating strong startup responsiveness on short requests.

Definitionttft

It is the slowest TTFT prompt, though only marginally, which suggests modest variance rather than a major latency issue.

All Runs

PromptTypeTok/sTTFT
1. Api Design
throughput-api-design
throughput92.7392ms
2. Api Design
throughput-api-design
throughput89.3395ms
3. Api Design
throughput-api-design
throughput93.2379ms
1. Data Structures
throughput-data-structures
throughput68.4407ms
2. Data Structures
throughput-data-structures
throughput90.5377ms
3. Data Structures
throughput-data-structures
throughput88.0380ms
1. Essay
throughput-essay
throughput54.7479ms
2. Essay
throughput-essay
throughput57.0345ms
3. Essay
throughput-essay
throughput59.9451ms
1. Definition
ttft-definition
ttftn/a358ms
2. Definition
ttft-definition
ttftn/a406ms
3. Definition
ttft-definition
ttftn/a507ms
1. Factual
ttft-factual
ttftn/a2421ms
2. Factual
ttft-factual
ttftn/a387ms
3. Factual
ttft-factual
ttftn/a380ms

15 runs · Throughput rows require valid long-output runs · TTFT shown for all successful runs