AI WebGPU Lab Benchmark

Agent Step Latency Benchmark

`bench-agent-step-latency` compares deterministic browser agent profiles against one fixed local task deck, tool catalog, and scoring rule.

The benchmark keeps task completion, tool-call success, step latency, and intervention count in one schema-aligned result surface before real browser-controller runtime wiring lands.

Benchmark Controls

Run the benchmark to compare task success, average step latency, tool-call success, and intervention count.

Workspace Snapshot

Profile Matrix

Winner Draft

No benchmark run yet.

Task Deck

Metrics

Environment

Fixture Metadata

Loading fixture...

Activity Log

    Schema-Aligned Result Draft

    {
      "status": "pending"
    }

    Benchmark Notes

    • All profiles share one task deck, local-only workspace, and tool catalog.
    • The winner favors high task success and tool success before lower step latency.
    • Real browser controller and planner runtimes should keep the same metric fields and artifact names.