Model Analysis
Claude Opus 4.7
openrouter/anthropic/claude-opus-4.7
77.1
overall score
74.1% accuracy
27.5% fabrication
Tasks
25
Passed
24
Failed
1
Avg latency
6967ms
Total cost
$0.4198
Cluster Performance
Behavioral Claims6 tasks
75.1
Edge Case ID5 tasks
48.6
Complexity Analysis4 tasks
92.3
Bug Detection5 tasks
86.0
Doc Accuracy5 tasks
87.2
All Task Results
| Task | Cluster | Score | Accuracy | Fabricated | Latency | |
|---|---|---|---|---|---|---|
| halluc-nested-merge-claims | Behavioral Claims | 0.0 | 0 | 0 | 7887ms | |
| halluc-edge-pagination | Edge Case ID | 20.3 | 0 | 6 | 10893ms | |
| halluc-edge-rate-limiter | Edge Case ID | 33.6 | 17 | 5 | 14658ms | |
| halluc-edge-tree-traversal | Edge Case ID | 43.3 | 33 | 4 | 6712ms | |
| halluc-edge-date-parser | Edge Case ID | 60.6 | 67 | 2 | 4814ms | |
| halluc-doc-middleware-chain | Doc Accuracy | 69.2 | 67 | 2 | 7588ms | |
| halluc-dedup-sort-claims | Behavioral Claims | 70.3 | 60 | 2 | 5989ms | |
| halluc-bug-closure-loop | Bug Detection | 74.2 | 67 | 2 | 7000ms | |
| halluc-cache-eviction-claims | Behavioral Claims | 81.1 | 83 | 1 | 8137ms | |
| halluc-complexity-sort-chain | Complexity Analysis | 81.4 | 83 | 1 | 6009ms | |
| halluc-bug-async-race | Bug Detection | 82.0 | 83 | 1 | 6399ms | |
| halluc-doc-event-emitter | Doc Accuracy | 82.3 | 83 | 1 | 5643ms | |
| halluc-doc-query-builder | Doc Accuracy | 84.9 | 83 | 1 | 7145ms | |
| halluc-edge-string-truncate | Edge Case ID | 85.0 | 80 | 1 | 7868ms | |
| halluc-bug-off-by-one | Bug Detection | 86.0 | 80 | 1 | 5390ms | |
| halluc-bug-type-coercion | Bug Detection | 88.0 | 83 | 1 | 9145ms | |
| halluc-complexity-graph-bfs | Complexity Analysis | 88.3 | 83 | 1 | 6474ms | |
| halluc-parser-output-claims | Behavioral Claims | 99.5 | 100 | 0 | 5622ms | |
| halluc-complexity-nested-loops | Complexity Analysis | 99.6 | 100 | 0 | 6059ms | |
| halluc-retry-logic-claims | Behavioral Claims | 99.6 | 100 | 0 | 6484ms | |
| halluc-complexity-recursive-memo | Complexity Analysis | 99.7 | 100 | 0 | 4991ms | |
| halluc-doc-validation-pipe | Doc Accuracy | 99.7 | 100 | 0 | 5859ms | |
| halluc-bug-null-coalesce | Bug Detection | 99.8 | 100 | 0 | 7240ms | |
| halluc-doc-http-handler | Doc Accuracy | 99.8 | 100 | 0 | 4938ms | |
| halluc-state-machine-claims | Behavioral Claims | 99.9 | 100 | 0 | 5225ms |
25tasks · Sorted by score (lowest first) · Fabricated = high-confidence false claims