BridgeBenchBridgeBench
Speed
Model Analysis

Gemini 3.1 Pro

google/gemini-3.1-pro-preview

122.2

median tok/s

7608ms TTFT
100.0% success

Throughput Runs

9

TTFT Runs

6

Avg TTFT

7265ms

Avg Throughput

117.0 tok/s

Total Cost

$0.1101

Commentary

by openai/gpt-5.4-mini

Gemini 3.1 Pro delivered strong sustained decode performance with a median throughput of 122.2 tok/s and average throughput of 117.0 tok/s across 9 runs, with 100% success and no observed failures. Startup latency is the main weakness: median TTFT is 7608 ms and average TTFT is 7265 ms, which is high for a speed benchmark, though cost remains modest at $0.110104 overall.

Api Designthroughput

This is the fastest throughput case at 127.1 tok/s median, indicating the model can sustain high decode speed on long-form technical output. With 2563 average output tokens and no issues, performance here is stable and representative of the model's best sustained rate.

Data Structuresthroughput

Throughput stays strong at 123.6 tok/s median, only slightly below the API design prompt, so decode speed is consistent across technical prompts. The 2252-token average output length did not introduce any reliability or slowdown issues.

Essaythroughput

This prompt is the main throughput weakness, dropping to 100.4 tok/s median, about 20% below the other throughput tasks. The longer-form essay style likely increases generation complexity or variance, but it still completed successfully with no failures.

Definitionttft

TTFT is slow at 7691 ms median even though the output is tiny at 63 tokens, so startup overhead dominates this case. This suggests the model's first-token latency is not well optimized for short responses.

Factualttft

This is the best startup case at 5739 ms median TTFT, but it is still materially high for a short factual response. The 31-token average output confirms the latency is mostly prefill/startup cost rather than decode time.

Notable Prompts

Api Designthroughput

Highest sustained throughput at 127.1 tok/s with no issues, making it the clearest positive signal for long-form generation speed.

Essaythroughput

Lowest throughput at 100.4 tok/s, indicating the model slows down on essay-style generation relative to other technical prompts.

Definitionttft

Median TTFT of 7691 ms is very high for a 63-token response, so short-answer latency is the biggest user-visible bottleneck.

Factualttft

Best TTFT at 5739 ms, but still slow enough that startup latency remains a systemic issue rather than an isolated outlier.

All Runs

PromptTypeTok/sTTFT
1. Api Design
throughput-api-design
throughput122.219420ms
2. Api Design
throughput-api-design
throughput131.522340ms
3. Api Design
throughput-api-design
throughput127.116002ms
1. Data Structures
throughput-data-structures
throughput124.618575ms
2. Data Structures
throughput-data-structures
throughput123.618562ms
3. Data Structures
throughput-data-structures
throughput121.230087ms
1. Essay
throughput-essay
throughput105.321466ms
2. Essay
throughput-essay
throughput96.819340ms
3. Essay
throughput-essay
throughput100.418341ms
1. Definition
ttft-definition
ttftn/a7691ms
2. Definition
ttft-definition
ttftn/a8681ms
3. Definition
ttft-definition
ttftn/a7525ms
1. Factual
ttft-factual
ttftn/a5167ms
2. Factual
ttft-factual
ttftn/a5739ms
3. Factual
ttft-factual
ttftn/a8789ms

15 runs · Throughput rows require valid long-output runs · TTFT shown for all successful runs