Vision + control for AI agents

Give your AI the whole machine — browser and desktop.

A free, local MCP server that gives your AI eyes and hands on any browser tab and any Windows app — driving the web by the DOM, not pixel-guessing, and sending only what changed. 100% on your PC, any model.

Scanned clean by VirusTotal Windows 10/11 macOS & Linux soon Tested on Claude Code
How it works

One loop: see · act · remember · repeat.

Everything Verdesk does is the same four-step loop. Each step costs the model less than a screenshot tool would.

See

The screen arrives as text and layout, not raw pixels. Every region is perceptually hashed each turn — only the new and changed ones travel, deduplicated and compressed; the rest stays cached.

Act

You say what; the engine finds the pixel. It steps down from a semantic control to the exact point and operates any Windows app — Win32, WPF, UWP, Electron, browsers, even games and canvas UIs.

You never do coordinate math to find a button.

Remember

Teach a button by its visual fingerprint and Verdesk re-finds it later — even after the window moved. A live index links every object to the actions that use it.

Deterministic, on CPU — works where the accessibility tree is blind (games, custom UIs).

Repeat

Record a task once, with the real timing between steps. From then on a cheaper model replays it on its own — and it keeps working even if the window moves or resizes.

Trains are functions with slots — pass arguments, edit one step, no re-recording.

How it compares

Everything they do — plus what only Verdesk brings.

The browser control of Playwright and Browser Use, the whole-desktop reach of Computer Use, and a stack of things none of them have — on a local MCP server, with any model, free.

Capability PuppeteerGoogle · Chrome PlaywrightMicrosoft · cross-browser Browser Useopen source · web agent-browserVercel · web Computer UseAnthropic · full screen Verdeskindie · browser + desktop
Drives a real browser
Navigates by DOM, not pixel-guessing
Cross-browser — Firefox & WebKit
Any desktop app — Win32, Electron, games
Sends deltas, not a screenshot each turn
Reads text for free — native OCR
Record once, replay with a cheaper model
Moves secrets the model never sees
A browser profile per project
Runs 100% local — no vendor cloud
Works with any AI model
Free

yes ·  no ·  not applicable. Each tool is great at its job; Verdesk is the only one that does all of it.

The toolbox, as a tree

Everything your agent can do.

~60 tools, organized as a tree so even a small model jumps straight to the one it needs — no scanning a flat list.

  • See — the screen, only what changed lookcapture
  • Read — plain text from any region read_text
  • Click — left, right, scroll, drag click_atclick_textdrag_path
  • Type — text & key combos type_textpress_key
  • Windows — pick, focus, resize focus_windowset_window_size
  • Browser — navigate, switch tabs, read the page navigatebrowser_tabsbrowser_snapshot
  • Record & replay — a task, once, then free playbook_recordplaybook_replay
  • Learn & track — objects, even if they moved learn_buttontrack_object
  • Open apps — apps & CLIs, via Win+R execute
Measured, not promised

The savings, on instruments.

Real figures from a reproducible benchmark — concrete numbers, not adjectives.

vision tokens saved
vision tokens per turn
baseline full screenshot, every turn
~1,500
Verdesk visual deltas — only what moved
~110

Vision tokens for a 1080p screenshot, measured on Claude — paid every turn. Verdesk sends only the cells that changed; other vision models scale the same way.

Plans

Free to run. Pro to ship.

Free $0

The complete Verdesk vision layer, free for personal and non-commercial use.

  • Full modulated vision pipeline
  • Local-mode MCP server
  • All base tools — capture, buffer, history, UIA
  • Modulation profiles
  • Offline installer · no account · no telemetry
Download free
For shipping
Pro $49/year

Yearly subscription, one developer. Everything in Free, plus what you need to ship.

  • + Remote access over LAN / WAN
  • + Action trains — record a task once, replay it cheaply (even with a smaller model)
  • + Mapped, editable screen objects — an interconnected network of actions you tweak without re-recording
  • + Commercial-use license — 1 developer
  • + Up to 3 device activations
  • + Updates included while subscribed
Buy Verdesk Pro
Coming soon
Team $199/year

Yearly subscription, up to 5 developers. Shared license keys and centralized billing.

  • + Everything in Pro
  • + Up to 5 developer seats
  • + Centralized billing & activations
  • + Priority email support
Ultrafast install

Running in 60 seconds. 4 steps.

  1. 1

    Download

    Grab the setup from GitHub Releases — 210 MB, single .exe, WebView2 bundled offline.

  2. 2

    Install

    Run Verdesk_x.y.z_x64-setup.exe. No admin needed, no telemetry. The tray icon lands quietly in the system tray.

  3. 3

    Copy the prompt

    Tray icon → SettingsConnection. One button copies the MCP server config to your clipboard — paste it into your AI client.

  4. 4

    Restart the terminal

    Close and reopen your terminal (Claude Code, etc.) so it picks up the new MCP server. Done.

Remote

Remote vision and control. Through an encrypted tunnel.

Your AI agent on one machine, Verdesk on another — everything travels end-to-end encrypted. You generate the keys; no SaaS broker in the middle. Under the hood: an SSH port-forward, with Tailscale as an optional fallback for CGNAT.

Your AI. Your machine. Free.

Download free

Windows 10 / 11 · 64-bit · single executable · macOS & Linux soon

Camilo Brossard
Built by one developer

Camilo Brossard

A network engineer building Verdesk solo, in the open — no team, no VC, no telemetry. Questions, ideas or bug reports? Write me directly; I read every one.