BridgeBenchBridgeBench
Speed
Model Analysis

Claude Sonnet 4.6

anthropic/claude-sonnet-4-6

95.3

median tok/s

1207ms TTFT
100.0% success

Throughput Runs

8

TTFT Runs

6

Avg TTFT

1239ms

Avg Throughput

86.1 tok/s

Total Cost

$0.4993

Commentary

by openai/gpt-5.4-mini

Claude Sonnet 4.6 is reliable on BridgeBench with a 100.0% success rate and no run failures, but startup latency is moderate: median TTFT is 1207 ms and average TTFT is 1239 ms. Sustained decode performance is strong overall at 95.3 tok/s median throughput, though the average drops to 86.1 tok/s, indicating some variability; cost is moderate at $0.499320 for the benchmark set.

Api Designthroughput

This is the strongest throughput case at 101.4 tok/s median with 3 successful runs and no issues. Output length is large at ~4095 tokens, so the model maintains high decode speed on long-form technical generation.

Data Structuresthroughput

Throughput is slightly lower than API design at 94.3 tok/s median, but still solid and stable across 3 successful runs. The near-identical 4096-token outputs suggest consistent long-context decoding without degradation.

Essaythroughput

This is the main throughput weakness, with median speed dropping to 48.8 tok/s across 2 runs. The long technical essay workload appears to stress sustained generation more than the structured technical prompts, pulling down the overall average throughput.

Definitionttft

Short-definition prompts have the best startup latency at 986 ms median TTFT, which is the fastest TTFT in the set. With only ~80 output tokens, this suggests the model can begin responding relatively quickly when the completion is short.

Factualttft

Factual prompts are slower to start at 1428 ms median TTFT, the worst TTFT in the set. The low output length (~27 tokens) means the latency is dominated by prefill/startup rather than decode, so this is a clear first-token responsiveness weakness.

Notable Prompts

Api Designthroughput

Highest sustained throughput at 101.4 tok/s with no issues, making it the best long-form technical decode result.

Essaythroughput

Throughput falls to 48.8 tok/s, far below the other long-output prompts, indicating sensitivity to essay-style generation.

Factualttft

At 1428 ms median TTFT, this is the slowest startup path and the clearest first-token latency outlier.

Definitionttft

Fastest TTFT at 986 ms, indicating better responsiveness on short, direct prompts.

All Runs

PromptTypeTok/sTTFT
1. Api Design
throughput-api-design
throughput97.21339ms
2. Api Design
throughput-api-design
throughput101.4871ms
3. Api Design
throughput-api-design
throughput108.91496ms
1. Data Structures
throughput-data-structures
throughput96.4832ms
2. Data Structures
throughput-data-structures
throughput94.31699ms
3. Data Structures
throughput-data-structures
throughput93.01872ms
1. Essay
throughput-essay
throughput47.72736ms
3. Essay
throughput-essay
throughput49.8838ms
1. Definition
ttft-definition
ttftn/a747ms
2. Definition
ttft-definition
ttftn/a1730ms
3. Definition
ttft-definition
ttftn/a986ms
1. Factual
ttft-factual
ttftn/a1613ms
2. Factual
ttft-factual
ttftn/a930ms
3. Factual
ttft-factual
ttftn/a1428ms

14 runs · Throughput rows require valid long-output runs · TTFT shown for all successful runs