Benchmark Controls
Run the benchmark to compare task success, average step latency, tool-call success, and intervention count.
Workspace Snapshot
Profile Matrix
Winner Draft
No benchmark run yet.
`bench-agent-step-latency` compares deterministic browser agent profiles against one fixed local task deck, tool catalog, and scoring rule.
The benchmark keeps task completion, tool-call success, step latency, and intervention count in one schema-aligned result surface before real browser-controller runtime wiring lands.
Run the benchmark to compare task success, average step latency, tool-call success, and intervention count.
No benchmark run yet.
Loading fixture...
{
"status": "pending"
}