GLM-5
openrouter/z-ai/glm-5
73.3
median tok/s
Throughput Runs
9
TTFT Runs
6
Avg TTFT
1089ms
Avg Throughput
76.1 tok/s
Total Cost
$0.0728
Commentary
by openai/gpt-5.4-miniGLM-5 is reliable on BridgeBench with a 100.0% success rate and low total cost of $0.072828. Sustained decode performance is solid at 73.3 tok/s median throughput (76.1 tok/s average), while startup latency is moderate at 962 ms median TTFT and 1089 ms average TTFT, with the factual TTFT workload pulling startup slower than the definition workload.
This is the weakest sustained-throughput case at 67.2 tok/s median, likely reflecting the longer 2952-token generations and more complex instruction-following load. It is still stable with no failures, but decode speed drops below the model's overall median.
Throughput is near the overall median at 73.3 tok/s, indicating consistent decode performance on medium-length technical output. No issues were recorded, so this prompt looks representative of the model's baseline speed.
This is the fastest sustained-throughput prompt at 77.0 tok/s median, helped by the shortest average output length among throughput tasks. The result suggests the model maintains or slightly improves decode rate when generation length is lower.
Startup latency is strong here at 711 ms median TTFT, the best TTFT result in the set. The very short 54-token output likely reduces prefill and generation overhead, making this the model's fastest response-start case.
This is the slowest startup case at 1501 ms median TTFT despite only 12 average output tokens, which points to prompt-side prefill or retrieval-like overhead rather than decode cost. The gap versus ttft-definition is large enough to suggest TTFT is sensitive to prompt content, not just output length.
Notable Prompts
Lowest sustained throughput in the benchmark at 67.2 tok/s, making it the clearest decode-speed regression point.
Highest sustained throughput at 77.0 tok/s, indicating the model can keep decode speed above its median on shorter generations.
Best TTFT at 711 ms, so the model can start responding quickly when the prompt is compact.
Worst TTFT at 1501 ms, which is a notable startup penalty given the tiny output size.
All Runs
| Prompt | Type | Tok/s | TTFT | Tokens | Cost | |
|---|---|---|---|---|---|---|
1. Api Design throughput-api-design | throughput | 82.1 | 206ms | 3001 | $0.0097 | |
2. Api Design throughput-api-design | throughput | 67.2 | 2706ms | 3155 | $0.0102 | |
3. Api Design throughput-api-design | throughput | 45.5 | 532ms | 2701 | $0.0070 | |
1. Data Structures throughput-data-structures | throughput | 73.3 | 1426ms | 2589 | $0.0084 | |
2. Data Structures throughput-data-structures | throughput | 77.0 | 1408ms | 2685 | $0.0086 | |
3. Data Structures throughput-data-structures | throughput | 70.1 | 7581ms | 2848 | $0.0092 | |
1. Essay throughput-essay | throughput | 57.6 | 6991ms | 2022 | $0.0066 | |
2. Essay throughput-essay | throughput | 77.0 | 838ms | 2026 | $0.0066 | |
3. Essay throughput-essay | throughput | 135.4 | 571ms | 1858 | $0.0060 | |
1. Definition ttft-definition | ttft | n/a | 1037ms | 48 | $0.0001 | |
2. Definition ttft-definition | ttft | n/a | 711ms | 55 | $0.0002 | |
3. Definition ttft-definition | ttft | n/a | 648ms | 58 | $0.0002 | |
1. Factual ttft-factual | ttft | n/a | 1501ms | 11 | $0.0000 | |
2. Factual ttft-factual | ttft | n/a | 886ms | 12 | $0.0001 | |
3. Factual ttft-factual | ttft | n/a | 1752ms | 12 | $0.0001 |
15 runs · Throughput rows require valid long-output runs · TTFT shown for all successful runs