Claude Sonnet 4.6
anthropic/claude-sonnet-4-6
95.3
median tok/s
Throughput Runs
8
TTFT Runs
6
Avg TTFT
1239ms
Avg Throughput
86.1 tok/s
Total Cost
$0.4993
Commentary
by openai/gpt-5.4-miniClaude Sonnet 4.6 is reliable on BridgeBench with a 100.0% success rate and no run failures, but startup latency is moderate: median TTFT is 1207 ms and average TTFT is 1239 ms. Sustained decode performance is strong overall at 95.3 tok/s median throughput, though the average drops to 86.1 tok/s, indicating some variability; cost is moderate at $0.499320 for the benchmark set.
This is the strongest throughput case at 101.4 tok/s median with 3 successful runs and no issues. Output length is large at ~4095 tokens, so the model maintains high decode speed on long-form technical generation.
Throughput is slightly lower than API design at 94.3 tok/s median, but still solid and stable across 3 successful runs. The near-identical 4096-token outputs suggest consistent long-context decoding without degradation.
This is the main throughput weakness, with median speed dropping to 48.8 tok/s across 2 runs. The long technical essay workload appears to stress sustained generation more than the structured technical prompts, pulling down the overall average throughput.
Short-definition prompts have the best startup latency at 986 ms median TTFT, which is the fastest TTFT in the set. With only ~80 output tokens, this suggests the model can begin responding relatively quickly when the completion is short.
Factual prompts are slower to start at 1428 ms median TTFT, the worst TTFT in the set. The low output length (~27 tokens) means the latency is dominated by prefill/startup rather than decode, so this is a clear first-token responsiveness weakness.
Notable Prompts
Highest sustained throughput at 101.4 tok/s with no issues, making it the best long-form technical decode result.
Throughput falls to 48.8 tok/s, far below the other long-output prompts, indicating sensitivity to essay-style generation.
At 1428 ms median TTFT, this is the slowest startup path and the clearest first-token latency outlier.
Fastest TTFT at 986 ms, indicating better responsiveness on short, direct prompts.
All Runs
| Prompt | Type | Tok/s | TTFT | Tokens | Cost | |
|---|---|---|---|---|---|---|
1. Api Design throughput-api-design | throughput | 97.2 | 1339ms | 4096 | $0.0617 | |
2. Api Design throughput-api-design | throughput | 101.4 | 871ms | 4095 | $0.0617 | |
3. Api Design throughput-api-design | throughput | 108.9 | 1496ms | 4095 | $0.0617 | |
1. Data Structures throughput-data-structures | throughput | 96.4 | 832ms | 4095 | $0.0618 | |
2. Data Structures throughput-data-structures | throughput | 94.3 | 1699ms | 4096 | $0.0618 | |
3. Data Structures throughput-data-structures | throughput | 93.0 | 1872ms | 4096 | $0.0618 | |
1. Essay throughput-essay | throughput | 47.7 | 2736ms | 4096 | $0.0617 | |
3. Essay throughput-essay | throughput | 49.8 | 838ms | 4096 | $0.0617 | |
1. Definition ttft-definition | ttft | n/a | 747ms | 88 | $0.0014 | |
2. Definition ttft-definition | ttft | n/a | 1730ms | 71 | $0.0011 | |
3. Definition ttft-definition | ttft | n/a | 986ms | 82 | $0.0013 | |
1. Factual ttft-factual | ttft | n/a | 1613ms | 27 | $0.0005 | |
2. Factual ttft-factual | ttft | n/a | 930ms | 27 | $0.0005 | |
3. Factual ttft-factual | ttft | n/a | 1428ms | 27 | $0.0005 |
14 runs · Throughput rows require valid long-output runs · TTFT shown for all successful runs