`exp-vlm-browser-multimodal` records a deterministic browser vision-language baseline before real VLM runtimes, image preprocessors, and multimodal token pipelines land.
The harness fixes image fixture metadata, prompt set, image preprocess time, image-to-first-token latency, answer total latency, and task accuracy in one schema-aligned result.
Run Controls
Probe capability first, then run the deterministic multimodal prompt set to export vision latency and answer accuracy metadata.