AI WebGPU Lab Benchmark

Agent Step Latency Benchmark

`bench-agent-step-latency` compares deterministic browser agent profiles against one fixed local task deck, tool catalog, and scoring rule.

The benchmark keeps task completion, tool-call success, step latency, and intervention count in one schema-aligned result surface before real browser-controller runtime wiring lands.

Benchmark Controls

Run the benchmark to compare task success, average step latency, tool-call success, and intervention count.

Workspace Snapshot

Profile Matrix

Winner Draft

No benchmark run yet.

Task Deck

Metrics

Environment

Fixture Metadata

Loading fixture...

Activity Log

Schema-Aligned Result Draft

{
  "status": "pending"
}

Benchmark Notes

All profiles share one task deck, local-only workspace, and tool catalog.
The winner favors high task success and tool success before lower step latency.
Real browser controller and planner runtimes should keep the same metric fields and artifact names.

Open Repository Read README View Results