Guide · macOS MCP

The MCP agent dry run plan nobody else talks about: one tool, zero events, one grep.

Every “dry run plan” tutorial online is about Terraform, Kubernetes, or a LangGraph planner/executor block diagram. None of them tell you how to dry-run a real GUI agent without emitting a single hardware event. macos-use has exactly one tool that does this.

See the primitive Show me the loop

Matthew Diakonov, Written with AI

Published April 20, 20267 min

5.0from Built for MCP agents that run against real user desktops

6 tools total, 1 non-mutating

Zero CGEvents on dry run

Grep-able /tmp/macos-use/ plan files

Written in Swift, native AX APIs

Dry run, then commit.

The macOS MCP pattern

refresh_traversal reads the AX tree.

Zero CGEvents. Zero InputGuard.

The flat text file is your plan.

Grep the line you want.

One mutating call commits.

0:00 / 0:05

Why the other dry run plans do not fit a desktop agent

Terraform dry-run works because infrastructure is declarative. You read state, diff against the target, print the plan, then apply. The pending side effects are purely symbolic until you run terraform apply.

A GUI agent has none of that. Every control lives inside an app whose state you cannot symbolically evaluate. A planner that says “I will click the Send button” does not know where the button is, whether the window scrolled, whether a modal popped up, or whether the user just alt-tabbed. A plan that is not grounded in the live tree is a hallucination.

The macos-use answer is not a planning mode on the mutating tools. It is a separate read-only tool whose entire job is to return the live tree, fast, with zero hardware events emitted. That tool is refresh_traversal.

THE DRY-RUN LOOP

The anchor fact

One line of Swift marks this tool as the dry run

Open Sources/MCPServer/main.swift. Line 1667:

Sources/MCPServer/main.swift (line 1667)

refresh_traversal is the only tool name that makes that comparison false. Five of the server’s six tools are mutating. One is not. The read-only one takes the non-disruptive branch for every single checkpoint in the handler: no saved cursor, no saved frontmost app, no CGEventTap install, no fullscreen pill overlay, no 30-second watchdog, no 200ms post-action grace period, no cursor-restore, no app-focus-restore.

0tools in the MCP server

0of them is non-mutating

0CGEvents fired by refresh_traversal

0line in main.swift that defines the split

“CGEvent.post() calls on the refresh_traversal code path. Verified by reading every branch under `if isDisruptive` in the handler.”

Sources/MCPServer/main.swift:1667-1781

The minimum dry-run-then-commit loop

Three steps. No planner object, no sequencer, no intermediate representation. The grep result IS the plan.

dry-run-plan.sh

Actual server logs during a dry run + commit

WHO CALLS WHAT DURING A DRY RUN

What the dry run actually gives you

Six properties, each a direct consequence of isDisruptive = false.

Zero CGEvents

refresh_traversal runs the traverseOnly action path. No CGEvent.post() call. No keystroke, no click, no scroll. Pure AX read. If you have already granted accessibility permission, this tool can run a thousand times in a loop and the system event log stays empty.

Zero InputGuard

isDisruptive = false means InputGuard.shared.engage() is never called. No CGEventTap install, no fullscreen overlay, no 30-second watchdog timer. The human's keyboard and mouse stay fully responsive while the tree is read.

Zero state save

The cursor save at main.swift:1672 and the frontmost-app save at main.swift:1671 are both inside `if isDisruptive`. A dry run touches neither. Restore logic at main.swift:1767-1781 is also skipped: there is nothing to restore.

One grep-able file

The traversal dumps to /tmp/macos-use/<ts>_refresh_traversal.txt, one element per line, format: [Role] "text" x:N y:N w:W h:H visible. Use `grep -n` in a sub-tool or file-reader, pull the coordinate line you want, pass those exact numbers to click_and_traverse.

Same screenshot contract

A PNG screenshot is captured with the same timestamp: /tmp/macos-use/<ts>_refresh_traversal.png. Use it to visually verify the state you just planned against. Agents that read screenshots during planning should read the PNG pair, not re-traverse.

Same 6-tool schema

refresh_traversal takes one required param (pid). It is the smallest MCP surface of any tool in the server. There is no option to enable or disable anything. That is deliberate: a dry-run tool with knobs stops being a dry-run tool.

Dry run vs. mutating call, side by side

What a browser-agent dry run typically looks like vs. what macos-use returns from refresh_traversal.

Feature	Typical browser-agent dry run	macos-use refresh_traversal
Fires hardware events during dry run	Yes, planner runs on live page, triggers hover/focus/JS	No, traverseOnly path, zero CGEvent.post() calls
Returns coordinates for commit	No, DOM selectors or XPath, remapped at execution	Yes: x, y, w, h in CGEvent-native point space (1pt == 1px)
Plan artifact format	Usually nested JSON or in-memory	Flat text, one element per line, greppable
Plan artifact persisted to disk	No, held in agent memory	Yes: /tmp/macos-use/<ts>_refresh_traversal.txt plus .png
Safe to run in a loop while user is typing	No, browser agents steal focus, emit clicks	Yes: no InputGuard, no focus steal, no overlay
Commit path detects plan drift	Rarely, plan vs live is checked by the LLM	Yes: mutating tools run their own traverseBefore

Dry-run-then-commit checklist

Get a PID from open_application_and_traverse
Call refresh_traversal with that PID
Read the /tmp/macos-use/*_refresh_traversal.txt path from the response
Grep the file for the role and label you want
Copy x, y, w, h from the matching line
Commit one click_and_traverse call with those coords
Read the returned diff to confirm the click landed

A concrete 4-step walkthrough

Works against any macOS app with accessibility permission granted. No mocks.

Open and discover

Call open_application_and_traverse. You get a PID plus the first full tree dump. No dry run needed yet; this is the bootstrap.

Dry run

Call refresh_traversal with the PID. Read the response. The file path is your plan sheet. Nothing has been clicked.

Plan from grep

Grep the .txt file for the one element you want. Coordinates come out on the matched line. No fuzzy resolution, no selector language.

Commit one call

Call click_and_traverse (optionally with text and pressKey chained). Read the returned diff. Plan drift surfaces as a surprise in the diff.

Why not just “plan in the LLM” and skip the tool call?

Because the LLM does not know the current coordinates. Screenshot pixel positions do not match screen coordinates (they differ by the window origin offset plus any monitor with non-zero x). The only source of truth for where to click is the live AX tree, and the only way to ground the plan in it is to read it.

A dry-run tool that does not emit events, does not steal focus, and writes a flat greppable file is the smallest primitive that makes a grounded plan cheap. Once the plan is written, the commit is one call. That is the whole design.

Running an agent on real user desktops and keeping the dry-run/commit split honest?

Book a 30-minute call. I will walk you through how refresh_traversal fits into your planner/executor loop and how to treat /tmp/macos-use/ as your agent's plan log.

Frequently asked questions

Does mcp-server-macos-use have a literal 'dry run' flag or planning mode?

No, and you do not want one. The server ships six tools (open_application_and_traverse, click_and_traverse, type_and_traverse, press_key_and_traverse, scroll_and_traverse, refresh_traversal). Five of them are mutating. One of them, refresh_traversal, is pure read. The dry-run primitive is not a flag on the mutating tools; it is a separate tool. You call refresh_traversal to plan, and you call one of the other five to commit. That is the whole contract. See Sources/MCPServer/main.swift:1408 for the tool list.

How do I know refresh_traversal really fires zero hardware events?

main.swift:1667 declares `let isDisruptive = params.name != refreshTool.name`. Every input-guard install, cursor save, frontmost-app save, CGEventTap, and post-action cursor restore is gated on `if isDisruptive` (main.swift:1670, 1754, 1767-1781). refresh_traversal takes the non-disruptive branch: no CGEventTap is installed, no cursor is saved, no app focus is saved, and the InputGuard.shared.engage() call at main.swift:1696 is never reached. The action path runs `primaryAction = .traverseOnly` (main.swift:1656), which only reads AX attributes. You can verify by stracing the process or just tailing stderr: the 'InputGuard: engaging' log line never appears for refresh_traversal.

Where is the plan artifact stored and what is its format?

Every tool call writes a flat-text file to /tmp/macos-use/<millisecond-timestamp>_<toolname-without-prefix>.txt. For a dry run the file is named like /tmp/macos-use/1713610245731_refresh_traversal.txt. Each line is one AX element: `[AXButton (button)] "Send" x:1240 y:720 w:56 h:32 visible`. Roles are prefixed with AX so you can grep for the kind you want. Coordinates are top-left plus width and height, already in CGEvent-compatible point space (1pt == 1px, no backingScaleFactor math required, see CLAUDE.md Coordinate System note). The write happens at main.swift:1829.

What is the minimum loop for a dry-run-then-commit agent flow?

Three steps. 1) refresh_traversal with pid=X. 2) Grep the resulting .txt file for the role and label you want (`grep -n 'AXButton.*"Send"' /tmp/macos-use/<file>.txt`). 3) click_and_traverse with the x, y, w, h values from that line. The click tool auto-centers at (x+w/2, y+h/2), so you pass the values from the grep output verbatim. There is no mapping layer, no screenshot-to-coordinate math, no fuzzy element resolver sitting in the middle. The grep match is the plan.

Why is a grep-able text file better than a JSON tree for planning?

Three reasons. 1) LLMs handle line-oriented grep output more reliably than nested JSON when you only want 3 of 800 elements. 2) The agent can run actual `grep -n` in a sub-tool instead of loading the full tree into context. 3) Each line is independently meaningful, so you can build the click sequence with a few grep invocations and keep the rest of the tree out of your context window. main.swift:1002-1004 writes one element per line and that is the whole trick.

What happens if the UI changes between the dry run and the commit?

The mutating tool runs its own traversal immediately before the action (traverseBefore = true), so even a stale plan is checked against the live tree on commit. If the target element moved, the AX tree at commit time reflects that. If the tool fails to find the element at the requested coordinates, the returned diff will be empty or reflect the wrong click, and you re-plan from the post-action .txt file. Nothing gets committed to an invisible element silently. The diff tells you what actually changed.

Is refresh_traversal faster than the mutating tools?

Yes, because it skips several real-time operations. It does not CFRunLoopAddSource the CGEventTap (InputGuard.swift:148-152), does not schedule the 30-second watchdog timer (InputGuard.swift:172-180), does not render the translucent overlay window (InputGuard.swift:202-277), does not wait the 100ms inter-action sleep in composed mode (main.swift:1727), and does not wait the 200ms post-action grace period (main.swift:1757). Net: a dry run is typically hundreds of milliseconds cheaper than a mutating call that does the same traversal internally. If your agent only needs the AX tree, refresh_traversal is the cheapest way to get it.

How do I plan a multi-step action with this primitive?

Do not. The server already compresses click + type + pressKey into one tool call via click_and_traverse's optional `text` and `pressKey` params (main.swift:1329-1348). The dry-run plan only needs to resolve the FIRST click target, because the rest of the chain is declarative (the text to type, the key to press) and does not require separate element lookup. For genuinely multi-step flows (click A, wait for sheet, click B), alternate: refresh_traversal → click, refresh_traversal → click. Each dry run is one cheap read; each commit is one chained call.

Does this beat the LangGraph planner/executor pattern that most articles describe?

It is orthogonal. LangGraph's planner/executor decomposition is a prompt-level pattern: one LLM call writes a plan, another LLM call executes. What mcp-server-macos-use gives you is the *physical* substrate that makes that pattern safe on a real desktop: a read-only tool that returns the exact coordinates your plan needs, and a diff on every mutating call so the executor can detect plan drift. You can run LangGraph planner/executor over macos-use and the two layers stack cleanly.

What if I want to dry-run a whole multi-step flow before committing any step?

You cannot, and that limitation is honest. A GUI dry run is only valid against the live tree. Simulating the intermediate states (what the Save sheet will look like after Cmd+S, what the confirmation dialog will say) requires running the action. The server does not fake this. The realistic pattern is: dry-run the current state, commit one step, dry-run the new state, commit the next step. Each commit produces a diff you use to verify that step landed before planning the next. There is no speculative multi-step dry run because the UI cannot be symbolically evaluated.

How does this differ from a Windows / Terminator dry run?

The pattern generalizes; the APIs do not. Terminator (the Windows-side MCP sibling) reads UI Automation's UIA tree instead of macOS's AX tree. Element IDs, role names (ControlType.Button vs AXButton), and coordinate conventions differ. The discipline is the same: one read-only tool returns the tree, one mutating tool commits. This article is specifically about the macOS side because `isDisruptive = params.name != refreshTool.name` is a macos-use-specific line of code, and `/tmp/macos-use/` is a macos-use-specific file path.

What is the smallest reproducible 'dry run plan' session I can run today?

Clone the repo, run `xcrun --toolchain com.apple.dt.toolchain.XcodeDefault swift build`, point any MCP client at `.build/debug/MCPServer`, call open_application_and_traverse with identifier='Safari' to get a PID, then call refresh_traversal with that PID. Open the file at the path in the summary. That is your dry-run plan: every interactive element of Safari's frontmost window, one per line, with coordinates. Grep it for whatever you want to click next. No events have been sent yet; the only side effect so far was opening the app.

More macOS MCP internals