Model Analysis
Qwen 3.6 Plus
openrouter/qwen/qwen3.6-plus
85.1
overall score
100.0% repro
95.0% regress
26.5% diagnose
Tasks
10
Passed
9
Failed
1
Avg latency
106382ms
Total cost
$0.0000
Cluster Performance
Test Failure2 tasks
78.3
Runtime Exception2 tasks
90.6
Incorrect Output1 tasks
87.3
Async Timing1 tasks
87.3
State Mutation2 tasks
84.4
Regression After Refactor2 tasks
85.2
All Task Results
| Task | Cluster | Score | Repro | Hidden | Diagnose | Latency | |
|---|---|---|---|---|---|---|---|
| Safe Number Parsing Coercion Bug debug-type-coercion-v2 | Test Failure | 69.0 | 100 | 50 | 60 | 133196ms | |
| Default Merge Mutation Bug debug-object-mutation-v2 | State Mutation | 84.0 | 100 | 100 | 10 | 78290ms | |
| Deep Clone Shared Reference Bug debug-deep-clone-v2 | State Mutation | 84.8 | 100 | 100 | 15 | 78821ms | |
| LRU Access Order Regression debug-lru-eviction-v2 | Regression After Refactor | 85.0 | 100 | 100 | 10 | 180595ms | |
| Cycle Detection False Positive Regression debug-graph-cycle-v2 | Regression After Refactor | 85.3 | 100 | 100 | 25 | 100086ms | |
| Broken Batch Promise Chain debug-promise-chain-v2 | Async Timing | 87.3 | 100 | 100 | 15 | 54665ms | |
| Sliding Window Rate Limiter debug-rate-limiter-v2 | Incorrect Output | 87.3 | 100 | 100 | 15 | 72786ms | |
| Loop Closure Capture Bug debug-closure-loop-v2 | Test Failure | 87.6 | 100 | 100 | 18 | 65528ms | |
| Flatten Object Null Base Case Bug debug-recursion-base-case-v2 | Runtime Exception | 87.6 | 100 | 100 | 18 | 94055ms | |
| JSON Parser Escape and Whitespace Bugs debug-json-parser-v2 | Runtime Exception | 93.5 | 100 | 100 | 80 | 205798ms |
10tasks · visible repro, hidden bug, and regression scoring