Model Analysis
Claude Opus 4.7
openrouter/anthropic/claude-opus-4.7
86.2
overall score
100.0% repro
100.0% regress
21.5% diagnose
Tasks
10
Passed
10
Failed
0
Avg latency
6004ms
Total cost
$0.1636
Cluster Performance
Test Failure2 tasks
89.4
Runtime Exception2 tasks
85.6
Incorrect Output1 tasks
87.3
Async Timing1 tasks
87.3
State Mutation2 tasks
84.4
Regression After Refactor2 tasks
84.6
All Task Results
| Task | Cluster | Score | Repro | Hidden | Diagnose | Latency | |
|---|---|---|---|---|---|---|---|
| Loop Closure Capture Bug debug-closure-loop-v2 | Test Failure | 84.0 | 100 | 100 | 10 | 7482ms | |
| Deep Clone Shared Reference Bug debug-deep-clone-v2 | State Mutation | 84.0 | 100 | 100 | 10 | 5649ms | |
| Cycle Detection False Positive Regression debug-graph-cycle-v2 | Regression After Refactor | 84.1 | 100 | 100 | 18 | 5945ms | |
| Default Merge Mutation Bug debug-object-mutation-v2 | State Mutation | 84.8 | 100 | 100 | 15 | 3352ms | |
| LRU Access Order Regression debug-lru-eviction-v2 | Regression After Refactor | 85.0 | 100 | 100 | 10 | 5991ms | |
| Flatten Object Null Base Case Bug debug-recursion-base-case-v2 | Runtime Exception | 85.1 | 100 | 100 | 18 | 5337ms | |
| JSON Parser Escape and Whitespace Bugs debug-json-parser-v2 | Runtime Exception | 86.0 | 100 | 100 | 30 | 11418ms | |
| Broken Batch Promise Chain debug-promise-chain-v2 | Async Timing | 87.3 | 100 | 100 | 15 | 5010ms | |
| Sliding Window Rate Limiter debug-rate-limiter-v2 | Incorrect Output | 87.3 | 100 | 100 | 15 | 4976ms | |
| Safe Number Parsing Coercion Bug debug-type-coercion-v2 | Test Failure | 94.8 | 100 | 100 | 75 | 4883ms |
10tasks · visible repro, hidden bug, and regression scoring