Model Analysis

Claude Sonnet 4.6

anthropic/claude-sonnet-4-6

95.3

median tok/s

1207ms TTFT

100.0% success

Throughput Runs

TTFT Runs

Avg TTFT

1239ms

Avg Throughput

86.1 tok/s

Total Cost

$0.4993

Commentary

by openai/gpt-5.4-mini

Claude Sonnet 4.6 is reliable on BridgeBench with a 100.0% success rate and no run failures, but startup latency is moderate: median TTFT is 1207 ms and average TTFT is 1239 ms. Sustained decode performance is strong overall at 95.3 tok/s median throughput, though the average drops to 86.1 tok/s, indicating some variability; cost is moderate at $0.499320 for the benchmark set.

Api Designthroughput

This is the strongest throughput case at 101.4 tok/s median with 3 successful runs and no issues. Output length is large at ~4095 tokens, so the model maintains high decode speed on long-form technical generation.

Data Structuresthroughput

Throughput is slightly lower than API design at 94.3 tok/s median, but still solid and stable across 3 successful runs. The near-identical 4096-token outputs suggest consistent long-context decoding without degradation.

Essaythroughput

This is the main throughput weakness, with median speed dropping to 48.8 tok/s across 2 runs. The long technical essay workload appears to stress sustained generation more than the structured technical prompts, pulling down the overall average throughput.

Definitionttft

Short-definition prompts have the best startup latency at 986 ms median TTFT, which is the fastest TTFT in the set. With only ~80 output tokens, this suggests the model can begin responding relatively quickly when the completion is short.

Factualttft

Factual prompts are slower to start at 1428 ms median TTFT, the worst TTFT in the set. The low output length (~27 tokens) means the latency is dominated by prefill/startup rather than decode, so this is a clear first-token responsiveness weakness.

Notable Prompts

Api Designthroughput

Highest sustained throughput at 101.4 tok/s with no issues, making it the best long-form technical decode result.

Essaythroughput

Throughput falls to 48.8 tok/s, far below the other long-output prompts, indicating sensitivity to essay-style generation.

Factualttft

At 1428 ms median TTFT, this is the slowest startup path and the clearest first-token latency outlier.

Definitionttft

Fastest TTFT at 986 ms, indicating better responsiveness on short, direct prompts.

All Runs

Prompt	Type	Tok/s	TTFT	Tokens	Cost
1. Api Design throughput-api-design	throughput	97.2	1339ms	4096	$0.0617
2. Api Design throughput-api-design	throughput	101.4	871ms	4095	$0.0617
3. Api Design throughput-api-design	throughput	108.9	1496ms	4095	$0.0617
1. Data Structures throughput-data-structures	throughput	96.4	832ms	4095	$0.0618
2. Data Structures throughput-data-structures	throughput	94.3	1699ms	4096	$0.0618
3. Data Structures throughput-data-structures	throughput	93.0	1872ms	4096	$0.0618
1. Essay throughput-essay	throughput	47.7	2736ms	4096	$0.0617
3. Essay throughput-essay	throughput	49.8	838ms	4096	$0.0617
1. Definition ttft-definition	ttft	n/a	747ms	88	$0.0014
2. Definition ttft-definition	ttft	n/a	1730ms	71	$0.0011
3. Definition ttft-definition	ttft	n/a	986ms	82	$0.0013
1. Factual ttft-factual	ttft	n/a	1613ms	27	$0.0005
2. Factual ttft-factual	ttft	n/a	930ms	27	$0.0005
3. Factual ttft-factual	ttft	n/a	1428ms	27	$0.0005

14 runs · Throughput rows require valid long-output runs · TTFT shown for all successful runs