Mistral Small 4
Rank #3 · 23.6B · Q4_K_M
Summary
Pass Rate
65.8%
Tasks Passed
79/120
Model Size
23.6B
Quantization
Q4_K_M
Median Throughput
4.7 tok/s
Median TTFT
2910 ms
Inference Success
100.0%
Avg Latency
53050 ms
Hardware Profile
Device
DGX Spark
Chip
GB10 Grace Blackwell
Memory
128 GB Unified
Backend
ollama
Quantization
Q4_K_M
Peak GPU Mem
0.0 GB
Category Results
Speed
4.7 tok/s · 2910ms TTFTHallucination
12/3040.0%
Code Generation
15/2075.0%
Reasoning
18/3060.0%
Instruction Following
15/2075.0%
Task Results
Speed19/20 passed
| Task | Difficulty | Result | Latency | Tokens |
|---|---|---|---|---|
| ttft-short-100 | standard | Pass | 10557ms | 8 |
| ttft-short-200 | standard | Pass | 20366ms | 100 |
| ttft-medium-500 | standard | Pass | 20355ms | 100 |
| ttft-medium-1k | standard | Pass | 24520ms | 100 |
| ttft-long-2k | standard | Pass | 39756ms | 100 |
| ttft-chat-context | standard | Pass | 19944ms | 100 |
| ttft-json-output | standard | Pass | 20829ms | 100 |
| ttft-multilang | standard | Pass | 21033ms | 100 |
| ttft-reasoning | standard | Pass | 22237ms | 100 |
| ttft-creative | standard | Pass | 20112ms | 100 |
| tp-essay | standard | Pass | 244669ms | 1269 |
| tp-code-app | standard | Pass | 415947ms | 2000 |
| tp-tutorial | standard | Pass | 295964ms | 1447 |
| tp-analysis | standard | Pass | 179352ms | 858 |
| tp-debug | standard | Pass | 146813ms | 606 |
| tp-architecture | standard | Pass | 370223ms | 1730 |
| tp-comparison | standard | Pass | 355420ms | 1611 |
| tp-security | standard | Pass | 451058ms | 2000 |
| tp-algorithm | standard | Fail | 449484ms | 0 |
| tp-documentation | standard | Pass | 397604ms | 1845 |
Hallucination12/30 passed
| Task | Difficulty | Result | Latency | Tokens |
|---|---|---|---|---|
| fact-01 | easy | Pass | 3839ms | 15 |
| fact-02 | medium | Pass | 8639ms | 33 |
| fact-03 | medium | Fail | 31486ms | 144 |
| fact-04 | hard | Pass | 20196ms | 96 |
| fact-05 | hard | Fail | 27458ms | 126 |
| fact-06 | easy | Fail | 46562ms | 219 |
| fact-07 | medium | Pass | 3772ms | 15 |
| fact-08 | hard | Pass | 36736ms | 188 |
| fact-09 | medium | Pass | 6564ms | 31 |
| fact-10 | hard | Fail | 130101ms | 501 |
| code-01 | easy | Fail | 44930ms | 249 |
| code-02 | medium | Pass | 23112ms | 124 |
| code-03 | medium | Pass | 96342ms | 537 |
| code-04 | hard | Pass | 68379ms | 381 |
| code-05 | hard | Fail | 108099ms | 610 |
| code-06 | easy | Fail | 101569ms | 565 |
| code-07 | medium | Fail | 98800ms | 545 |
| code-08 | hard | Fail | 101897ms | 565 |
| code-09 | medium | Fail | 13981ms | 73 |
| code-10 | hard | Pass | 36302ms | 209 |
| cal-01 | medium | Fail | 52245ms | 290 |
| cal-02 | hard | Pass | 138801ms | 771 |
| cal-03 | medium | Fail | 15307ms | 78 |
| cal-04 | easy | Fail | 72681ms | 402 |
| cal-05 | hard | Fail | 65144ms | 360 |
| cal-06 | medium | Fail | 75477ms | 423 |
| cal-07 | hard | Pass | 140233ms | 780 |
| cal-08 | medium | Fail | 32715ms | 174 |
| cal-09 | easy | Fail | 25025ms | 140 |
| cal-10 | hard | Fail | 25086ms | 133 |
Code Generation15/20 passed
| Task | Difficulty | Result | Latency | Tokens |
|---|---|---|---|---|
| fn-01 | easy | Pass | 14625ms | 56 |
| fn-02 | easy | Pass | 11674ms | 54 |
| fn-03 | medium | Fail | 10565ms | 47 |
| fn-04 | medium | Pass | 27160ms | 140 |
| fn-05 | medium | Pass | 13605ms | 67 |
| fn-06 | hard | Fail | 25100ms | 124 |
| fn-07 | hard | Pass | 32937ms | 169 |
| fn-08 | hard | Fail | 19262ms | 93 |
| bug-01 | easy | Pass | 11851ms | 45 |
| bug-02 | medium | Pass | 20636ms | 81 |
| bug-03 | hard | Pass | 22701ms | 91 |
| bug-04 | medium | Pass | 21591ms | 89 |
| algo-01 | medium | Pass | 33188ms | 180 |
| algo-02 | hard | Pass | 31672ms | 171 |
| algo-03 | medium | Pass | 14886ms | 71 |
| algo-04 | hard | Pass | 17249ms | 86 |
| multi-01 | hard | Fail | 23319ms | 120 |
| multi-02 | hard | Pass | 43554ms | 231 |
| multi-03 | hard | Fail | 50209ms | 263 |
| multi-04 | hard | Pass | 27710ms | 144 |
Reasoning18/30 passed
| Task | Difficulty | Result | Latency | Tokens |
|---|---|---|---|---|
| arith-01 | hard | Fail | 4035ms | 7 |
| arith-02 | hard | Fail | 3404ms | 2 |
| arith-03 | expert | Pass | 20056ms | 95 |
| arith-04 | expert | Pass | 26545ms | 137 |
| arith-05 | expert | Fail | 2662ms | 4 |
| arith-06 | hard | Pass | 7387ms | 26 |
| spatial-01 | hard | Pass | 5715ms | 13 |
| spatial-02 | expert | Fail | 3952ms | 2 |
| spatial-03 | expert | Pass | 26784ms | 133 |
| spatial-04 | hard | Pass | 43035ms | 233 |
| spatial-05 | expert | Fail | 4616ms | 4 |
| spatial-06 | hard | Fail | 6559ms | 14 |
| cstr-01 | hard | Pass | 6795ms | 19 |
| cstr-02 | expert | Pass | 6145ms | 14 |
| cstr-03 | expert | Pass | 7906ms | 21 |
| cstr-04 | hard | Pass | 4776ms | 12 |
| cstr-05 | expert | Fail | 6020ms | 15 |
| cstr-06 | hard | Pass | 12823ms | 47 |
| adv-01 | hard | Pass | 3165ms | 9 |
| adv-02 | expert | Fail | 2272ms | 3 |
| adv-03 | expert | Fail | 4365ms | 13 |
| adv-04 | hard | Fail | 4645ms | 20 |
| adv-05 | expert | Pass | 22060ms | 116 |
| adv-06 | expert | Fail | 5402ms | 19 |
| cf-01 | hard | Pass | 3528ms | 3 |
| cf-02 | expert | Fail | 7602ms | 22 |
| cf-03 | expert | Pass | 20123ms | 95 |
| cf-04 | hard | Pass | 5048ms | 13 |
| cf-05 | expert | Pass | 31040ms | 158 |
| cf-06 | expert | Pass | 4406ms | 2 |
Instruction Following15/20 passed
| Task | Difficulty | Result | Latency | Tokens |
|---|---|---|---|---|
| fmt-01 | easy | Pass | 3861ms | 11 |
| fmt-02 | easy | Pass | 12667ms | 65 |
| fmt-03 | medium | Pass | 12801ms | 66 |
| fmt-04 | medium | Fail | 8922ms | 39 |
| fmt-05 | hard | Fail | 11681ms | 47 |
| fmt-06 | hard | Pass | 5055ms | 19 |
| con-01 | easy | Pass | 18699ms | 96 |
| con-02 | easy | Pass | 5872ms | 28 |
| con-03 | medium | Pass | 7825ms | 42 |
| con-04 | medium | Pass | 3453ms | 15 |
| con-05 | hard | Fail | 21889ms | 117 |
| con-06 | hard | Pass | 21151ms | 99 |
| role-01 | medium | Pass | 17255ms | 83 |
| role-02 | medium | Pass | 27336ms | 144 |
| role-03 | hard | Pass | 62801ms | 339 |
| role-04 | hard | Pass | 113116ms | 614 |
| mc-01 | hard | Fail | 14122ms | 67 |
| mc-02 | hard | Fail | 12626ms | 55 |
| mc-03 | hard | Pass | 12326ms | 55 |
| mc-04 | hard | Pass | 8509ms | 30 |