BridgeBenchBridgeBench

Mistral Small 4

Rank #3 · 23.6B · Q4_K_M

Summary

Pass Rate

65.8%

Tasks Passed

79/120

Model Size

23.6B

Quantization

Q4_K_M

Median Throughput

4.7 tok/s

Median TTFT

2910 ms

Inference Success

100.0%

Avg Latency

53050 ms

Hardware Profile

Device

DGX Spark

Chip

GB10 Grace Blackwell

Memory

128 GB Unified

Backend

ollama

Quantization

Q4_K_M

Peak GPU Mem

0.0 GB

Category Results

Speed
4.7 tok/s · 2910ms TTFT
Hallucination
12/30
40.0%
Code Generation
15/20
75.0%
Reasoning
18/30
60.0%
Instruction Following
15/20
75.0%

Task Results

Speed19/20 passed
TaskDifficultyResultLatencyTokens
ttft-short-100standardPass10557ms8
ttft-short-200standardPass20366ms100
ttft-medium-500standardPass20355ms100
ttft-medium-1kstandardPass24520ms100
ttft-long-2kstandardPass39756ms100
ttft-chat-contextstandardPass19944ms100
ttft-json-outputstandardPass20829ms100
ttft-multilangstandardPass21033ms100
ttft-reasoningstandardPass22237ms100
ttft-creativestandardPass20112ms100
tp-essaystandardPass244669ms1269
tp-code-appstandardPass415947ms2000
tp-tutorialstandardPass295964ms1447
tp-analysisstandardPass179352ms858
tp-debugstandardPass146813ms606
tp-architecturestandardPass370223ms1730
tp-comparisonstandardPass355420ms1611
tp-securitystandardPass451058ms2000
tp-algorithmstandardFail449484ms0
tp-documentationstandardPass397604ms1845
Hallucination12/30 passed
TaskDifficultyResultLatencyTokens
fact-01easyPass3839ms15
fact-02mediumPass8639ms33
fact-03mediumFail31486ms144
fact-04hardPass20196ms96
fact-05hardFail27458ms126
fact-06easyFail46562ms219
fact-07mediumPass3772ms15
fact-08hardPass36736ms188
fact-09mediumPass6564ms31
fact-10hardFail130101ms501
code-01easyFail44930ms249
code-02mediumPass23112ms124
code-03mediumPass96342ms537
code-04hardPass68379ms381
code-05hardFail108099ms610
code-06easyFail101569ms565
code-07mediumFail98800ms545
code-08hardFail101897ms565
code-09mediumFail13981ms73
code-10hardPass36302ms209
cal-01mediumFail52245ms290
cal-02hardPass138801ms771
cal-03mediumFail15307ms78
cal-04easyFail72681ms402
cal-05hardFail65144ms360
cal-06mediumFail75477ms423
cal-07hardPass140233ms780
cal-08mediumFail32715ms174
cal-09easyFail25025ms140
cal-10hardFail25086ms133
Code Generation15/20 passed
TaskDifficultyResultLatencyTokens
fn-01easyPass14625ms56
fn-02easyPass11674ms54
fn-03mediumFail10565ms47
fn-04mediumPass27160ms140
fn-05mediumPass13605ms67
fn-06hardFail25100ms124
fn-07hardPass32937ms169
fn-08hardFail19262ms93
bug-01easyPass11851ms45
bug-02mediumPass20636ms81
bug-03hardPass22701ms91
bug-04mediumPass21591ms89
algo-01mediumPass33188ms180
algo-02hardPass31672ms171
algo-03mediumPass14886ms71
algo-04hardPass17249ms86
multi-01hardFail23319ms120
multi-02hardPass43554ms231
multi-03hardFail50209ms263
multi-04hardPass27710ms144
Reasoning18/30 passed
TaskDifficultyResultLatencyTokens
arith-01hardFail4035ms7
arith-02hardFail3404ms2
arith-03expertPass20056ms95
arith-04expertPass26545ms137
arith-05expertFail2662ms4
arith-06hardPass7387ms26
spatial-01hardPass5715ms13
spatial-02expertFail3952ms2
spatial-03expertPass26784ms133
spatial-04hardPass43035ms233
spatial-05expertFail4616ms4
spatial-06hardFail6559ms14
cstr-01hardPass6795ms19
cstr-02expertPass6145ms14
cstr-03expertPass7906ms21
cstr-04hardPass4776ms12
cstr-05expertFail6020ms15
cstr-06hardPass12823ms47
adv-01hardPass3165ms9
adv-02expertFail2272ms3
adv-03expertFail4365ms13
adv-04hardFail4645ms20
adv-05expertPass22060ms116
adv-06expertFail5402ms19
cf-01hardPass3528ms3
cf-02expertFail7602ms22
cf-03expertPass20123ms95
cf-04hardPass5048ms13
cf-05expertPass31040ms158
cf-06expertPass4406ms2
Instruction Following15/20 passed
TaskDifficultyResultLatencyTokens
fmt-01easyPass3861ms11
fmt-02easyPass12667ms65
fmt-03mediumPass12801ms66
fmt-04mediumFail8922ms39
fmt-05hardFail11681ms47
fmt-06hardPass5055ms19
con-01easyPass18699ms96
con-02easyPass5872ms28
con-03mediumPass7825ms42
con-04mediumPass3453ms15
con-05hardFail21889ms117
con-06hardPass21151ms99
role-01mediumPass17255ms83
role-02mediumPass27336ms144
role-03hardPass62801ms339
role-04hardPass113116ms614
mc-01hardFail14122ms67
mc-02hardFail12626ms55
mc-03hardPass12326ms55
mc-04hardPass8509ms30