Model Analysis

Gemini 3.1 Pro

google/gemini-3.1-pro-preview

122.2

median tok/s

7608ms TTFT

100.0% success

Throughput Runs

TTFT Runs

Avg TTFT

7265ms

Avg Throughput

117.0 tok/s

Total Cost

$0.1101

Commentary

by openai/gpt-5.4-mini

Gemini 3.1 Pro delivered strong sustained decode performance with a median throughput of 122.2 tok/s and average throughput of 117.0 tok/s across 9 runs, with 100% success and no observed failures. Startup latency is the main weakness: median TTFT is 7608 ms and average TTFT is 7265 ms, which is high for a speed benchmark, though cost remains modest at $0.110104 overall.

Api Designthroughput

This is the fastest throughput case at 127.1 tok/s median, indicating the model can sustain high decode speed on long-form technical output. With 2563 average output tokens and no issues, performance here is stable and representative of the model's best sustained rate.

Data Structuresthroughput

Throughput stays strong at 123.6 tok/s median, only slightly below the API design prompt, so decode speed is consistent across technical prompts. The 2252-token average output length did not introduce any reliability or slowdown issues.

Essaythroughput

This prompt is the main throughput weakness, dropping to 100.4 tok/s median, about 20% below the other throughput tasks. The longer-form essay style likely increases generation complexity or variance, but it still completed successfully with no failures.

Definitionttft

TTFT is slow at 7691 ms median even though the output is tiny at 63 tokens, so startup overhead dominates this case. This suggests the model's first-token latency is not well optimized for short responses.

Factualttft

This is the best startup case at 5739 ms median TTFT, but it is still materially high for a short factual response. The 31-token average output confirms the latency is mostly prefill/startup cost rather than decode time.

Notable Prompts

Api Designthroughput

Highest sustained throughput at 127.1 tok/s with no issues, making it the clearest positive signal for long-form generation speed.

Essaythroughput

Lowest throughput at 100.4 tok/s, indicating the model slows down on essay-style generation relative to other technical prompts.

Definitionttft

Median TTFT of 7691 ms is very high for a 63-token response, so short-answer latency is the biggest user-visible bottleneck.

Factualttft

Best TTFT at 5739 ms, but still slow enough that startup latency remains a systemic issue rather than an isolated outlier.

All Runs

Prompt	Type	Tok/s	TTFT	Tokens	Cost
1. Api Design throughput-api-design	throughput	122.2	19420ms	2563	$0.0129
2. Api Design throughput-api-design	throughput	131.5	22340ms	2563	$0.0129
3. Api Design throughput-api-design	throughput	127.1	16002ms	2563	$0.0129
1. Data Structures throughput-data-structures	throughput	124.6	18575ms	2252	$0.0114
2. Data Structures throughput-data-structures	throughput	123.6	18562ms	2252	$0.0114
3. Data Structures throughput-data-structures	throughput	121.2	30087ms	2252	$0.0114
1. Essay throughput-essay	throughput	105.3	21466ms	2351	$0.0119
2. Essay throughput-essay	throughput	96.8	19340ms	2351	$0.0119
3. Essay throughput-essay	throughput	100.4	18341ms	2351	$0.0119
1. Definition ttft-definition	ttft	n/a	7691ms	63	$0.0003
2. Definition ttft-definition	ttft	n/a	8681ms	63	$0.0003
3. Definition ttft-definition	ttft	n/a	7525ms	63	$0.0003
1. Factual ttft-factual	ttft	n/a	5167ms	31	$0.0002
2. Factual ttft-factual	ttft	n/a	5739ms	31	$0.0002
3. Factual ttft-factual	ttft	n/a	8789ms	31	$0.0002

15 runs · Throughput rows require valid long-output runs · TTFT shown for all successful runs