BridgeBenchBridgeBench
Debugging
Model Analysis

Qwen 3.6 Plus

openrouter/qwen/qwen3.6-plus

85.1

overall score

100.0% repro
95.0% regress
26.5% diagnose

Tasks

10

Passed

9

Failed

1

Avg latency

106382ms

Total cost

$0.0000

Cluster Performance

Test Failure2 tasks
78.3
Runtime Exception2 tasks
90.6
Incorrect Output1 tasks
87.3
Async Timing1 tasks
87.3
State Mutation2 tasks
84.4
Regression After Refactor2 tasks
85.2

All Task Results

TaskClusterScore
Safe Number Parsing Coercion Bug

debug-type-coercion-v2

Test Failure69.0
Default Merge Mutation Bug

debug-object-mutation-v2

State Mutation84.0
Deep Clone Shared Reference Bug

debug-deep-clone-v2

State Mutation84.8
LRU Access Order Regression

debug-lru-eviction-v2

Regression After Refactor85.0
Cycle Detection False Positive Regression

debug-graph-cycle-v2

Regression After Refactor85.3
Broken Batch Promise Chain

debug-promise-chain-v2

Async Timing87.3
Sliding Window Rate Limiter

debug-rate-limiter-v2

Incorrect Output87.3
Loop Closure Capture Bug

debug-closure-loop-v2

Test Failure87.6
Flatten Object Null Base Case Bug

debug-recursion-base-case-v2

Runtime Exception87.6
JSON Parser Escape and Whitespace Bugs

debug-json-parser-v2

Runtime Exception93.5

10tasks · visible repro, hidden bug, and regression scoring