Model Analysis
Claude Opus 4.6 (April 14)
anthropic/claude-opus-4-6-apr14
76.9
overall score
72.2% accuracy
29.1% fabrication
Tasks
30
Passed
30
Failed
0
Avg latency
11310ms
Total cost
$1.7349
Cluster Performance
Behavioral Claims6 tasks
85.3
Edge Case ID5 tasks
43.9
API Knowledge5 tasks
65.4
Complexity Analysis4 tasks
90.8
Bug Detection5 tasks
85.4
Doc Accuracy5 tasks
91.9
All Task Results
| Task | Cluster | Score | Accuracy | Fabricated | Latency | |
|---|---|---|---|---|---|---|
| halluc-edge-pagination | Edge Case ID | 20.2 | 0 | 6 | 13042ms | |
| halluc-edge-rate-limiter | Edge Case ID | 20.3 | 0 | 6 | 16814ms | |
| halluc-edge-tree-traversal | Edge Case ID | 33.5 | 17 | 5 | 11746ms | |
| halluc-api-map-set | API Knowledge | 37.7 | 17 | 5 | 19962ms | |
| halluc-api-regex-named-groups | API Knowledge | 58.2 | 50 | 3 | 13783ms | |
| halluc-api-promises | API Knowledge | 58.9 | 50 | 3 | 11896ms | |
| halluc-cache-eviction-claims | Behavioral Claims | 60.0 | 50 | 3 | 14093ms | |
| halluc-edge-date-parser | Edge Case ID | 60.2 | 67 | 2 | 7642ms | |
| halluc-dedup-sort-claims | Behavioral Claims | 69.6 | 60 | 2 | 9119ms | |
| halluc-bug-off-by-one | Bug Detection | 71.8 | 60 | 2 | 12275ms | |
| halluc-api-node-crypto | API Knowledge | 72.3 | 67 | 2 | 16464ms | |
| halluc-doc-middleware-chain | Doc Accuracy | 72.9 | 67 | 2 | 13361ms | |
| halluc-bug-closure-loop | Bug Detection | 73.5 | 67 | 2 | 10067ms | |
| halluc-complexity-sort-chain | Complexity Analysis | 81.5 | 83 | 1 | 8388ms | |
| halluc-bug-async-race | Bug Detection | 81.9 | 83 | 1 | 10400ms | |
| halluc-complexity-graph-bfs | Complexity Analysis | 82.0 | 83 | 1 | 9531ms | |
| halluc-nested-merge-claims | Behavioral Claims | 82.2 | 83 | 1 | 17878ms | |
| halluc-edge-string-truncate | Edge Case ID | 85.5 | 80 | 1 | 10046ms | |
| halluc-doc-query-builder | Doc Accuracy | 86.6 | 83 | 1 | 8018ms | |
| halluc-complexity-nested-loops | Complexity Analysis | 99.6 | 100 | 0 | 9206ms | |
| halluc-bug-type-coercion | Bug Detection | 99.7 | 100 | 0 | 13476ms | |
| halluc-api-zod-schema | API Knowledge | 99.8 | 100 | 0 | 10340ms | |
| halluc-retry-logic-claims | Behavioral Claims | 99.8 | 100 | 0 | 12039ms | |
| halluc-bug-null-coalesce | Bug Detection | 99.9 | 100 | 0 | 10159ms | |
| halluc-complexity-recursive-memo | Complexity Analysis | 99.9 | 100 | 0 | 7730ms | |
| halluc-parser-output-claims | Behavioral Claims | 99.9 | 100 | 0 | 11005ms | |
| halluc-doc-event-emitter | Doc Accuracy | 100.0 | 100 | 0 | 7838ms | |
| halluc-doc-http-handler | Doc Accuracy | 100.0 | 100 | 0 | 6882ms | |
| halluc-doc-validation-pipe | Doc Accuracy | 100.0 | 100 | 0 | 8093ms | |
| halluc-state-machine-claims | Behavioral Claims | 100.0 | 100 | 0 | 8019ms |
30tasks · Sorted by score (lowest first) · Fabricated = high-confidence false claims