Gemini 3.1 Pro
google/gemini-3.1-pro-preview
122.2
median tok/s
Throughput Runs
9
TTFT Runs
6
Avg TTFT
7265ms
Avg Throughput
117.0 tok/s
Total Cost
$0.1101
Commentary
by openai/gpt-5.4-miniGemini 3.1 Pro delivered strong sustained decode performance with a median throughput of 122.2 tok/s and average throughput of 117.0 tok/s across 9 runs, with 100% success and no observed failures. Startup latency is the main weakness: median TTFT is 7608 ms and average TTFT is 7265 ms, which is high for a speed benchmark, though cost remains modest at $0.110104 overall.
This is the fastest throughput case at 127.1 tok/s median, indicating the model can sustain high decode speed on long-form technical output. With 2563 average output tokens and no issues, performance here is stable and representative of the model's best sustained rate.
Throughput stays strong at 123.6 tok/s median, only slightly below the API design prompt, so decode speed is consistent across technical prompts. The 2252-token average output length did not introduce any reliability or slowdown issues.
This prompt is the main throughput weakness, dropping to 100.4 tok/s median, about 20% below the other throughput tasks. The longer-form essay style likely increases generation complexity or variance, but it still completed successfully with no failures.
TTFT is slow at 7691 ms median even though the output is tiny at 63 tokens, so startup overhead dominates this case. This suggests the model's first-token latency is not well optimized for short responses.
This is the best startup case at 5739 ms median TTFT, but it is still materially high for a short factual response. The 31-token average output confirms the latency is mostly prefill/startup cost rather than decode time.
Notable Prompts
Highest sustained throughput at 127.1 tok/s with no issues, making it the clearest positive signal for long-form generation speed.
Lowest throughput at 100.4 tok/s, indicating the model slows down on essay-style generation relative to other technical prompts.
Median TTFT of 7691 ms is very high for a 63-token response, so short-answer latency is the biggest user-visible bottleneck.
Best TTFT at 5739 ms, but still slow enough that startup latency remains a systemic issue rather than an isolated outlier.
All Runs
| Prompt | Type | Tok/s | TTFT | Tokens | Cost | |
|---|---|---|---|---|---|---|
1. Api Design throughput-api-design | throughput | 122.2 | 19420ms | 2563 | $0.0129 | |
2. Api Design throughput-api-design | throughput | 131.5 | 22340ms | 2563 | $0.0129 | |
3. Api Design throughput-api-design | throughput | 127.1 | 16002ms | 2563 | $0.0129 | |
1. Data Structures throughput-data-structures | throughput | 124.6 | 18575ms | 2252 | $0.0114 | |
2. Data Structures throughput-data-structures | throughput | 123.6 | 18562ms | 2252 | $0.0114 | |
3. Data Structures throughput-data-structures | throughput | 121.2 | 30087ms | 2252 | $0.0114 | |
1. Essay throughput-essay | throughput | 105.3 | 21466ms | 2351 | $0.0119 | |
2. Essay throughput-essay | throughput | 96.8 | 19340ms | 2351 | $0.0119 | |
3. Essay throughput-essay | throughput | 100.4 | 18341ms | 2351 | $0.0119 | |
1. Definition ttft-definition | ttft | n/a | 7691ms | 63 | $0.0003 | |
2. Definition ttft-definition | ttft | n/a | 8681ms | 63 | $0.0003 | |
3. Definition ttft-definition | ttft | n/a | 7525ms | 63 | $0.0003 | |
1. Factual ttft-factual | ttft | n/a | 5167ms | 31 | $0.0002 | |
2. Factual ttft-factual | ttft | n/a | 5739ms | 31 | $0.0002 | |
3. Factual ttft-factual | ttft | n/a | 8789ms | 31 | $0.0002 |
15 runs · Throughput rows require valid long-output runs · TTFT shown for all successful runs