BridgeBenchBridgeBench
Hallucination
Model Analysis

Qwen 3.6 Plus

openrouter/qwen/qwen3.6-plus

79.1

overall score

74.7% accuracy
27.0% fabrication

Tasks

30

Passed

30

Failed

0

Avg latency

61021ms

Total cost

$0.1993

Cluster Performance

Behavioral Claims6 tasks
89.0
Edge Case ID5 tasks
49.7
API Knowledge5 tasks
65.4
Complexity Analysis4 tasks
93.3
Bug Detection5 tasks
89.3
Doc Accuracy5 tasks
88.8

All Task Results

TaskClusterScore
halluc-edge-paginationEdge Case ID20.1
halluc-edge-tree-traversalEdge Case ID33.4
halluc-edge-rate-limiterEdge Case ID35.2
halluc-api-map-setAPI Knowledge37.7
halluc-api-regex-named-groupsAPI Knowledge56.0
halluc-api-promisesAPI Knowledge60.0
halluc-dedup-sort-claimsBehavioral Claims65.0
halluc-doc-middleware-chainDoc Accuracy70.7
halluc-api-node-cryptoAPI Knowledge73.4
halluc-bug-closure-loopBug Detection73.4
halluc-edge-date-parserEdge Case ID73.6
halluc-retry-logic-claimsBehavioral Claims82.3
halluc-edge-string-truncateEdge Case ID86.0
halluc-bug-async-raceBug Detection86.6
halluc-bug-null-coalesceBug Detection86.6
halluc-cache-eviction-claimsBehavioral Claims86.6
halluc-complexity-graph-bfsComplexity Analysis86.6
halluc-complexity-sort-chainComplexity Analysis86.6
halluc-doc-event-emitterDoc Accuracy86.6
halluc-doc-query-builderDoc Accuracy86.6
halluc-bug-off-by-oneBug Detection99.9
halluc-nested-merge-claimsBehavioral Claims99.9
halluc-api-zod-schemaAPI Knowledge100.0
halluc-bug-type-coercionBug Detection100.0
halluc-complexity-nested-loopsComplexity Analysis100.0
halluc-complexity-recursive-memoComplexity Analysis100.0
halluc-doc-http-handlerDoc Accuracy100.0
halluc-doc-validation-pipeDoc Accuracy100.0
halluc-parser-output-claimsBehavioral Claims100.0
halluc-state-machine-claimsBehavioral Claims100.0

30tasks · Sorted by score (lowest first) · Fabricated = high-confidence false claims