BridgeBenchBridgeBench
Hallucination
Model Analysis

Claude Opus 4.6 (April 14)

anthropic/claude-opus-4-6-apr14

76.9

overall score

72.2% accuracy
29.1% fabrication

Tasks

30

Passed

30

Failed

0

Avg latency

11310ms

Total cost

$1.7349

Cluster Performance

Behavioral Claims6 tasks
85.3
Edge Case ID5 tasks
43.9
API Knowledge5 tasks
65.4
Complexity Analysis4 tasks
90.8
Bug Detection5 tasks
85.4
Doc Accuracy5 tasks
91.9

All Task Results

TaskClusterScore
halluc-edge-paginationEdge Case ID20.2
halluc-edge-rate-limiterEdge Case ID20.3
halluc-edge-tree-traversalEdge Case ID33.5
halluc-api-map-setAPI Knowledge37.7
halluc-api-regex-named-groupsAPI Knowledge58.2
halluc-api-promisesAPI Knowledge58.9
halluc-cache-eviction-claimsBehavioral Claims60.0
halluc-edge-date-parserEdge Case ID60.2
halluc-dedup-sort-claimsBehavioral Claims69.6
halluc-bug-off-by-oneBug Detection71.8
halluc-api-node-cryptoAPI Knowledge72.3
halluc-doc-middleware-chainDoc Accuracy72.9
halluc-bug-closure-loopBug Detection73.5
halluc-complexity-sort-chainComplexity Analysis81.5
halluc-bug-async-raceBug Detection81.9
halluc-complexity-graph-bfsComplexity Analysis82.0
halluc-nested-merge-claimsBehavioral Claims82.2
halluc-edge-string-truncateEdge Case ID85.5
halluc-doc-query-builderDoc Accuracy86.6
halluc-complexity-nested-loopsComplexity Analysis99.6
halluc-bug-type-coercionBug Detection99.7
halluc-api-zod-schemaAPI Knowledge99.8
halluc-retry-logic-claimsBehavioral Claims99.8
halluc-bug-null-coalesceBug Detection99.9
halluc-complexity-recursive-memoComplexity Analysis99.9
halluc-parser-output-claimsBehavioral Claims99.9
halluc-doc-event-emitterDoc Accuracy100.0
halluc-doc-http-handlerDoc Accuracy100.0
halluc-doc-validation-pipeDoc Accuracy100.0
halluc-state-machine-claimsBehavioral Claims100.0

30tasks · Sorted by score (lowest first) · Fabricated = high-confidence false claims