See
The screen arrives as text and layout, not raw pixels. Every region is perceptually hashed each turn — only the new and changed ones travel, deduplicated and compressed; the rest stays cached.
A free, local MCP server that gives your AI eyes and hands on any browser tab and any Windows app — driving the web by the DOM, not pixel-guessing, and sending only what changed. 100% on your PC, any model.
Everything Verdesk does is the same four-step loop. Each step costs the model less than a screenshot tool would.
The screen arrives as text and layout, not raw pixels. Every region is perceptually hashed each turn — only the new and changed ones travel, deduplicated and compressed; the rest stays cached.
You say what; the engine finds the pixel. It steps down from a semantic control to the exact point and operates any Windows app — Win32, WPF, UWP, Electron, browsers, even games and canvas UIs.
You never do coordinate math to find a button.
Teach a button by its visual fingerprint and Verdesk re-finds it later — even after the window moved. A live index links every object to the actions that use it.
Deterministic, on CPU — works where the accessibility tree is blind (games, custom UIs).
Record a task once, with the real timing between steps. From then on a cheaper model replays it on its own — and it keeps working even if the window moves or resizes.
Trains are functions with slots — pass arguments, edit one step, no re-recording.
The browser control of Playwright and Browser Use, the whole-desktop reach of Computer Use, and a stack of things none of them have — on a local MCP server, with any model, free.
| Capability | PuppeteerGoogle · Chrome | PlaywrightMicrosoft · cross-browser | Browser Useopen source · web | agent-browserVercel · web | Computer UseAnthropic · full screen | Verdeskindie · browser + desktop |
|---|---|---|---|---|---|---|
| Drives a real browser | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
| Navigates by DOM, not pixel-guessing | ✓ | ✓ | ✓ | ✓ | ✗ | ✓ |
| Cross-browser — Firefox & WebKit | ✗ | ✓ | ✓ | ✗ | ✗ | ✗ |
| Any desktop app — Win32, Electron, games | ✗ | ✗ | ✗ | ✗ | ✓ | ✓ |
| Sends deltas, not a screenshot each turn | — | — | ✗ | ✗ | ✗ | ✓ |
| Reads text for free — native OCR | ✗ | ✗ | ✗ | ✗ | ✗ | ✓ |
| Record once, replay with a cheaper model | ✗ | ✗ | ✗ | ✗ | ✗ | ✓ |
| Moves secrets the model never sees | ✗ | ✗ | ✗ | ✗ | ✗ | ✓ |
| A browser profile per project | ✗ | ✗ | ✗ | ✗ | ✗ | ✓ |
| Runs 100% local — no vendor cloud | ✓ | ✓ | ✓ | ✓ | ✗ | ✓ |
| Works with any AI model | — | — | ✓ | ✓ | ✗ | ✓ |
| Free | ✓ | ✓ | ✓ | ✓ | ✗ | ✓ |
✓ yes · ✗ no · — not applicable. Each tool is great at its job; Verdesk is the only one that does all of it.
~60 tools, organized as a tree so even a small model jumps straight to the one it needs — no scanning a flat list.
lookcapture
read_text
click_atclick_textdrag_path
type_textpress_key
focus_windowset_window_size
navigatebrowser_tabsbrowser_snapshot
playbook_recordplaybook_replay
learn_buttontrack_object
execute
Real figures from a reproducible benchmark — concrete numbers, not adjectives.
Vision tokens for a 1080p screenshot, measured on Claude — paid every turn. Verdesk sends only the cells that changed; other vision models scale the same way.
The complete Verdesk vision layer, free for personal and non-commercial use.
Yearly subscription, one developer. Everything in Free, plus what you need to ship.
Yearly subscription, up to 5 developers. Shared license keys and centralized billing.
Grab the setup from GitHub Releases — 210 MB, single .exe, WebView2 bundled offline.
Run Verdesk_x.y.z_x64-setup.exe. No admin needed, no telemetry. The tray icon lands quietly in the system tray.
Tray icon → Settings → Connection. One button copies the MCP server config to your clipboard — paste it into your AI client.
Close and reopen your terminal (Claude Code, etc.) so it picks up the new MCP server. Done.
Your AI agent on one machine, Verdesk on another — everything travels end-to-end encrypted. You generate the keys; no SaaS broker in the middle. Under the hood: an SSH port-forward, with Tailscale as an optional fallback for CGNAT.
A network engineer building Verdesk solo, in the open — no team, no VC, no telemetry. Questions, ideas or bug reports? Write me directly; I read every one.