GPT-OSS 120B
Rank #2 · 120B · FP8
Summary
Pass Rate
66.7%
Tasks Passed
90/135
Model Size
120B
Quantization
FP8
Median Throughput
41.9 tok/s
Median TTFT
498 ms
Inference Success
100.0%
Avg Latency
11550 ms
Hardware Profile
Device
DGX Spark
Chip
GB10 Grace Blackwell
Memory
128 GB Unified
Backend
ollama
Quantization
FP8
Peak GPU Mem
0.0 GB
Category Results
Speed
41.9 tok/s · 498ms TTFTHallucination
10/3033.3%
Code Generation
14/2070.0%
Reasoning
26/3086.7%
Instruction Following
16/2080.0%
Task Results
Speed20/20 passed
| Task | Difficulty | Result | Latency | Tokens |
|---|---|---|---|---|
| ttft-short-100 | standard | Pass | 2900ms | 39 |
| ttft-short-200 | standard | Pass | 2757ms | 100 |
| ttft-medium-500 | standard | Pass | 2710ms | 100 |
| ttft-medium-1k | standard | Pass | 2860ms | 100 |
| ttft-long-2k | standard | Pass | 2960ms | 100 |
| ttft-chat-context | standard | Pass | 2702ms | 100 |
| ttft-json-output | standard | Pass | 2781ms | 100 |
| ttft-multilang | standard | Pass | 2776ms | 100 |
| ttft-reasoning | standard | Pass | 2784ms | 100 |
| ttft-creative | standard | Pass | 2706ms | 100 |
| tp-essay | standard | Pass | 48156ms | 2000 |
| tp-code-app | standard | Pass | 48200ms | 2000 |
| tp-tutorial | standard | Pass | 36231ms | 1500 |
| tp-analysis | standard | Pass | 36314ms | 1500 |
| tp-debug | standard | Pass | 36094ms | 1500 |
| tp-architecture | standard | Pass | 87910ms | 2000 |
| tp-comparison | standard | Pass | 48214ms | 2000 |
| tp-security | standard | Pass | 48146ms | 2000 |
| tp-algorithm | standard | Pass | 48293ms | 2000 |
| tp-documentation | standard | Pass | 48185ms | 2000 |
Hallucination10/30 passed
| Task | Difficulty | Result | Latency | Tokens |
|---|---|---|---|---|
| fact-01 | easy | Pass | 14148ms | 93 |
| fact-02 | medium | Pass | 3073ms | 114 |
| fact-03 | medium | Fail | 13802ms | 487 |
| fact-04 | hard | Fail | 12168ms | 500 |
| fact-05 | hard | Fail | 6450ms | 261 |
| fact-06 | easy | Fail | 5749ms | 153 |
| fact-07 | medium | Pass | 4176ms | 91 |
| fact-08 | hard | Pass | 6853ms | 276 |
| fact-09 | medium | Pass | 2930ms | 111 |
| fact-10 | hard | Pass | 12170ms | 500 |
| code-01 | easy | Fail | 8561ms | 345 |
| code-02 | medium | Pass | 4035ms | 154 |
| code-03 | medium | Pass | 12212ms | 500 |
| code-04 | hard | Fail | 12242ms | 500 |
| code-05 | hard | Fail | 12220ms | 500 |
| code-06 | easy | Fail | 12147ms | 500 |
| code-07 | medium | Fail | 12169ms | 500 |
| code-08 | hard | Fail | 12185ms | 500 |
| code-09 | medium | Fail | 7418ms | 298 |
| code-10 | hard | Pass | 12242ms | 500 |
| cal-01 | medium | Fail | 12242ms | 500 |
| cal-02 | hard | Fail | 12187ms | 500 |
| cal-03 | medium | Fail | 12181ms | 500 |
| cal-04 | easy | Fail | 12164ms | 500 |
| cal-05 | hard | Fail | 12241ms | 500 |
| cal-06 | medium | Fail | 13947ms | 500 |
| cal-07 | hard | Fail | 12236ms | 500 |
| cal-08 | medium | Fail | 12251ms | 500 |
| cal-09 | easy | Pass | 12183ms | 500 |
| cal-10 | hard | Fail | 12255ms | 500 |
Code Generation14/20 passed
| Task | Difficulty | Result | Latency | Tokens |
|---|---|---|---|---|
| fn-01 | easy | Pass | 3316ms | 124 |
| fn-02 | easy | Pass | 4398ms | 172 |
| fn-03 | medium | Fail | 2799ms | 103 |
| fn-04 | medium | Pass | 5941ms | 236 |
| fn-05 | medium | Pass | 18241ms | 755 |
| fn-06 | hard | Fail | 10799ms | 438 |
| fn-07 | hard | Fail | 8232ms | 333 |
| fn-08 | hard | Fail | 9370ms | 377 |
| bug-01 | easy | Pass | 4174ms | 154 |
| bug-02 | medium | Pass | 4341ms | 155 |
| bug-03 | hard | Pass | 7321ms | 286 |
| bug-04 | medium | Pass | 5483ms | 210 |
| algo-01 | medium | Pass | 5495ms | 216 |
| algo-02 | hard | Pass | 6167ms | 245 |
| algo-03 | medium | Pass | 3489ms | 131 |
| algo-04 | hard | Pass | 4766ms | 185 |
| multi-01 | hard | Pass | 6912ms | 275 |
| multi-02 | hard | Fail | 11538ms | 468 |
| multi-03 | hard | Pass | 9331ms | 377 |
| multi-04 | hard | Fail | 8087ms | 325 |
Reasoning26/30 passed
| Task | Difficulty | Result | Latency | Tokens |
|---|---|---|---|---|
| arith-01 | hard | Pass | 4461ms | 173 |
| arith-02 | hard | Pass | 2011ms | 71 |
| arith-03 | expert | Pass | 5310ms | 207 |
| arith-04 | expert | Pass | 7221ms | 289 |
| arith-05 | expert | Pass | 2182ms | 76 |
| arith-06 | hard | Pass | 8832ms | 357 |
| spatial-01 | hard | Pass | 7163ms | 285 |
| spatial-02 | expert | Pass | 4194ms | 159 |
| spatial-03 | expert | Pass | 15730ms | 646 |
| spatial-04 | hard | Pass | 10807ms | 438 |
| spatial-05 | expert | Pass | 4721ms | 181 |
| spatial-06 | hard | Fail | 24251ms | 1000 |
| cstr-01 | hard | Pass | 5154ms | 200 |
| cstr-02 | expert | Pass | 14177ms | 577 |
| cstr-03 | expert | Fail | 24363ms | 1000 |
| cstr-04 | hard | Pass | 24247ms | 1000 |
| cstr-05 | expert | Fail | 78398ms | 1000 |
| cstr-06 | hard | Pass | 18216ms | 743 |
| adv-01 | hard | Pass | 4081ms | 156 |
| adv-02 | expert | Pass | 4337ms | 167 |
| adv-03 | expert | Pass | 7043ms | 280 |
| adv-04 | hard | Pass | 11064ms | 371 |
| adv-05 | expert | Pass | 5857ms | 233 |
| adv-06 | expert | Pass | 7819ms | 315 |
| cf-01 | hard | Pass | 5658ms | 222 |
| cf-02 | expert | Fail | 9732ms | 390 |
| cf-03 | expert | Pass | 7095ms | 283 |
| cf-04 | hard | Pass | 10323ms | 417 |
| cf-05 | expert | Pass | 17652ms | 724 |
| cf-06 | expert | Pass | 11225ms | 454 |
Instruction Following16/20 passed
| Task | Difficulty | Result | Latency | Tokens |
|---|---|---|---|---|
| fmt-01 | easy | Pass | 1962ms | 67 |
| fmt-02 | easy | Pass | 3964ms | 154 |
| fmt-03 | medium | Pass | 3487ms | 135 |
| fmt-04 | medium | Pass | 2791ms | 103 |
| fmt-05 | hard | Pass | 3911ms | 150 |
| fmt-06 | hard | Pass | 6781ms | 272 |
| con-01 | easy | Pass | 11054ms | 453 |
| con-02 | easy | Fail | 2891ms | 108 |
| con-03 | medium | Pass | 3605ms | 69 |
| con-04 | medium | Pass | 2805ms | 106 |
| con-05 | hard | Pass | 8812ms | 357 |
| con-06 | hard | Fail | 12325ms | 500 |
| role-01 | medium | Pass | 4397ms | 165 |
| role-02 | medium | Pass | 12198ms | 500 |
| role-03 | hard | Pass | 12329ms | 500 |
| role-04 | hard | Fail | 12225ms | 500 |
| mc-01 | hard | Fail | 12237ms | 500 |
| mc-02 | hard | Pass | 8329ms | 331 |
| mc-03 | hard | Pass | 12128ms | 494 |
| mc-04 | hard | Pass | 10471ms | 424 |