Vision + control for AI agents

Give your AI the whole machine — browser and desktop.

A free, local MCP server that gives your AI eyes and hands on any browser tab and any Windows app — driving the web by the DOM, not pixel-guessing, and sending only what changed. 100% on your PC, any model.

Scanned clean by VirusTotal Windows 10/11 macOS & Linux soon Tested on Claude Code

How it works

One loop: see · act · remember · repeat.

Everything Verdesk does is the same four-step loop. Each step costs the model less than a screenshot tool would.

See

The screen arrives as text and layout, not raw pixels. Every region is perceptually hashed each turn — only the new and changed ones travel, deduplicated and compressed; the rest stays cached.

Act

You say what; the engine finds the pixel. It steps down from a semantic control to the exact point and operates any Windows app — Win32, WPF, UWP, Electron, browsers, even games and canvas UIs.

You never do coordinate math to find a button.

Remember

Teach a button by its visual fingerprint and Verdesk re-finds it later — even after the window moved. A live index links every object to the actions that use it.

Deterministic, on CPU — works where the accessibility tree is blind (games, custom UIs).

Repeat

Record a task once, with the real timing between steps. From then on a cheaper model replays it on its own — and it keeps working even if the window moves or resizes.

Trains are functions with slots — pass arguments, edit one step, no re-recording.

How it compares

Everything they do — plus what only Verdesk brings.

The browser control of Playwright and Browser Use, the whole-desktop reach of Computer Use, and a stack of things none of them have — on a local MCP server, with any model, free.

Capability	PuppeteerGoogle · Chrome	PlaywrightMicrosoft · cross-browser	Browser Useopen source · web	agent-browserVercel · web	Computer UseAnthropic · full screen	Verdeskindie · browser + desktop
Drives a real browser	✓	✓	✓	✓	✓	✓
Navigates by DOM, not pixel-guessing	✓	✓	✓	✓	✗	✓
Cross-browser — Firefox & WebKit	✗	✓	✓	✗	✗	✗
Any desktop app — Win32, Electron, games	✗	✗	✗	✗	✓	✓
Sends deltas, not a screenshot each turn	—	—	✗	✗	✗	✓
Reads text for free — native OCR	✗	✗	✗	✗	✗	✓
Record once, replay with a cheaper model	✗	✗	✗	✗	✗	✓
Moves secrets the model never sees	✗	✗	✗	✗	✗	✓
A browser profile per project	✗	✗	✗	✗	✗	✓
Runs 100% local — no vendor cloud	✓	✓	✓	✓	✗	✓
Works with any AI model	—	—	✓	✓	✗	✓
Free	✓	✓	✓	✓	✗	✓

✓ yes · ✗ no · — not applicable. Each tool is great at its job; Verdesk is the only one that does all of it.

The toolbox, as a tree

Everything your agent can do.

~60 tools, organized as a tree so even a small model jumps straight to the one it needs — no scanning a flat list.

See — the screen, only what changed lookcapture
Read — plain text from any region read_text
Click — left, right, scroll, drag click_atclick_textdrag_path
Type — text & key combos type_textpress_key
Windows — pick, focus, resize focus_windowset_window_size
Browser — navigate, switch tabs, read the page navigatebrowser_tabsbrowser_snapshot
Record & replay — a task, once, then free playbook_recordplaybook_replay
Learn & track — objects, even if they moved learn_buttontrack_object
Open apps — apps & CLIs, via Win+R execute

Measured, not promised

The savings, on instruments.

Real figures from a reproducible benchmark — concrete numbers, not adjectives.

vision tokens saved

vision tokens per turn

baseline full screenshot, every turn

~1,500

Verdesk visual deltas — only what moved

~110

Vision tokens for a 1080p screenshot, measured on Claude — paid every turn. Verdesk sends only the cells that changed; other vision models scale the same way.

Plans

Free to run. Pro to ship.

Free $0

The complete Verdesk vision layer, free for personal and non-commercial use.

Full modulated vision pipeline
Local-mode MCP server
All base tools — capture, buffer, history, UIA
Modulation profiles
Offline installer · no account · no telemetry

Download free

For shipping

Pro $49/year

Yearly subscription, one developer. Everything in Free, plus what you need to ship.

+ Remote access over LAN / WAN
+ Action trains — record a task once, replay it cheaply (even with a smaller model)
+ Mapped, editable screen objects — an interconnected network of actions you tweak without re-recording
+ Commercial-use license — 1 developer
+ Up to 3 device activations
+ Updates included while subscribed

Buy Verdesk Pro

Coming soon

Team $199/year

Yearly subscription, up to 5 developers. Shared license keys and centralized billing.

+ Everything in Pro
+ Up to 5 developer seats
+ Centralized billing & activations
+ Priority email support

Ultrafast install

Running in 60 seconds. 4 steps.

1
Download

Grab the setup from GitHub Releases — 210 MB, single .exe, WebView2 bundled offline.
2
Install

Run Verdesk_x.y.z_x64-setup.exe. No admin needed, no telemetry. The tray icon lands quietly in the system tray.
3
Copy the prompt

Tray icon → Settings → Connection. One button copies the MCP server config to your clipboard — paste it into your AI client.
4
Restart the terminal

Close and reopen your terminal (Claude Code, etc.) so it picks up the new MCP server. Done.

Remote

Remote vision and control. Through an encrypted tunnel.

Your AI agent on one machine, Verdesk on another — everything travels end-to-end encrypted. You generate the keys; no SaaS broker in the middle. Under the hood: an SSH port-forward, with Tailscale as an optional fallback for CGNAT.

✓ One-button setup — Verdesk audits SSH, firewall and config in 15 checks
✓ Your keys stay yours · no third-party broker, no telemetry
✓ Tailscale fallback for CGNAT — Verdesk picks the route automatically

Your AI. Your machine. Free.

Download free

Windows 10 / 11 · 64-bit · single executable · macOS & Linux soon

Built by one developer

Camilo Brossard

A network engineer building Verdesk solo, in the open — no team, no VC, no telemetry. Questions, ideas or bug reports? Write me directly; I read every one.

chamilonster@gmail.com @electrogaticultor GitHub clevercoder.app r/Verdesk