Model Analysis
Qwen 3.6 Plus
openrouter/qwen/qwen3.6-plus
79.1
overall score
74.7% accuracy
27.0% fabrication
Tasks
30
Passed
30
Failed
0
Avg latency
61021ms
Total cost
$0.1993
Cluster Performance
Behavioral Claims6 tasks
89.0
Edge Case ID5 tasks
49.7
API Knowledge5 tasks
65.4
Complexity Analysis4 tasks
93.3
Bug Detection5 tasks
89.3
Doc Accuracy5 tasks
88.8
All Task Results
| Task | Cluster | Score | Accuracy | Fabricated | Latency | |
|---|---|---|---|---|---|---|
| halluc-edge-pagination | Edge Case ID | 20.1 | 0 | 6 | 85130ms | |
| halluc-edge-tree-traversal | Edge Case ID | 33.4 | 17 | 5 | 58518ms | |
| halluc-edge-rate-limiter | Edge Case ID | 35.2 | 17 | 5 | 65558ms | |
| halluc-api-map-set | API Knowledge | 37.7 | 17 | 5 | 63408ms | |
| halluc-api-regex-named-groups | API Knowledge | 56.0 | 50 | 3 | 64219ms | |
| halluc-api-promises | API Knowledge | 60.0 | 50 | 3 | 60484ms | |
| halluc-dedup-sort-claims | Behavioral Claims | 65.0 | 60 | 2 | 53613ms | |
| halluc-doc-middleware-chain | Doc Accuracy | 70.7 | 67 | 2 | 58042ms | |
| halluc-api-node-crypto | API Knowledge | 73.4 | 67 | 2 | 60081ms | |
| halluc-bug-closure-loop | Bug Detection | 73.4 | 67 | 2 | 44486ms | |
| halluc-edge-date-parser | Edge Case ID | 73.6 | 83 | 1 | 64332ms | |
| halluc-retry-logic-claims | Behavioral Claims | 82.3 | 83 | 1 | 67444ms | |
| halluc-edge-string-truncate | Edge Case ID | 86.0 | 80 | 1 | 65548ms | |
| halluc-bug-async-race | Bug Detection | 86.6 | 83 | 1 | 59107ms | |
| halluc-bug-null-coalesce | Bug Detection | 86.6 | 83 | 1 | 67889ms | |
| halluc-cache-eviction-claims | Behavioral Claims | 86.6 | 83 | 1 | 60869ms | |
| halluc-complexity-graph-bfs | Complexity Analysis | 86.6 | 83 | 1 | 66441ms | |
| halluc-complexity-sort-chain | Complexity Analysis | 86.6 | 83 | 1 | 47033ms | |
| halluc-doc-event-emitter | Doc Accuracy | 86.6 | 83 | 1 | 56890ms | |
| halluc-doc-query-builder | Doc Accuracy | 86.6 | 83 | 1 | 53240ms | |
| halluc-bug-off-by-one | Bug Detection | 99.9 | 100 | 0 | 52810ms | |
| halluc-nested-merge-claims | Behavioral Claims | 99.9 | 100 | 0 | 105221ms | |
| halluc-api-zod-schema | API Knowledge | 100.0 | 100 | 0 | 50126ms | |
| halluc-bug-type-coercion | Bug Detection | 100.0 | 100 | 0 | 69945ms | |
| halluc-complexity-nested-loops | Complexity Analysis | 100.0 | 100 | 0 | 53322ms | |
| halluc-complexity-recursive-memo | Complexity Analysis | 100.0 | 100 | 0 | 45213ms | |
| halluc-doc-http-handler | Doc Accuracy | 100.0 | 100 | 0 | 32694ms | |
| halluc-doc-validation-pipe | Doc Accuracy | 100.0 | 100 | 0 | 65079ms | |
| halluc-parser-output-claims | Behavioral Claims | 100.0 | 100 | 0 | 72331ms | |
| halluc-state-machine-claims | Behavioral Claims | 100.0 | 100 | 0 | 61546ms |
30tasks · Sorted by score (lowest first) · Fabricated = high-confidence false claims