Guide / Claude Code + MCP + macOS
Claude Code + MCP + macOS accessibility API: the wiring that actually works
Everyone writing about this stops at "paste this JSON into a config file". The thing that actually trips people up is which process the OS asks for Accessibility, not the JSON. Here is the full wiring, the one TCC prompt that breaks the loop, and the 9-tool surface Claude Code gets once you have it right.
Direct answer · verified 2026-05-13
To let Claude Code drive native macOS apps via the accessibility API:
- Install the MCP server:
npm install -g mcp-server-macos-use(postinstall builds the Swift binary). - Register it with Claude Code:
claude mcp add macos-use -- npx -y mcp-server-macos-use. - Open System Settings > Privacy & Security > Accessibility, and grant permission to the terminal app you ran `claude` from (iTerm, Terminal, Ghostty, VS Code, whatever). Not to the MCP binary, not to
node, not toclaude. - Restart the terminal once, then start a new
claudesession. You now have 9 tools: open_application_and_traverse, click_and_traverse, type_and_traverse, press_key_and_traverse, scroll_and_traverse, set_value_and_traverse, press_ax_and_traverse, set_selected_and_traverse, refresh_traversal. Registered atSources/MCPServer/main.swift:1482.
The four moving parts
A reader asking about "Claude Code MCP desktop accessibility API" is asking about four things stitched together. Naming them up front is the part most guides skip, because they assume you already know.
1 / Claude Code
Anthropic's terminal CLI
The command-line client you run with claude. It speaks MCP to local servers over stdio.
2 / MCP
Model Context Protocol
JSON-RPC 2.0 over stdin/stdout. The server registers tools; the client calls them. No HTTP, no network.
3 / Accessibility API
AXUIElement + CGEvent
AXUIElement reads the structured UI tree of any app. CGEvent posts synthetic clicks and keystrokes. Native, no screenshot OCR.
4 / mcp-server-macos-use
The Swift bridge
The open-source binary that exposes 9 MCP tools and translates each one to AXUIElement + CGEvent calls. This is what you install.
Step-by-step wiring
Four steps. The third is the only one most guides skip; it's also the only one that actually matters once the first two are done.
Install the server
npm one-liner. Postinstall builds the Swift binary.
Register with Claude Code
claude mcp add macos-use ...
Grant Accessibility to the terminal
Not to the MCP binary. This is the trap.
Smoke test in claude
Ask claude to open Calculator. Watch the AX tree return.
1. Install
One command. The package's postinstall hook runs xcrun swift build -c release to compile the native binary, so the only host requirement is the Xcode Command Line Tools (run xcode-select --install if you haven't already).
npm install -g mcp-server-macos-use # postinstall: xcrun swift build -c release # produces ./.build/release/mcp-server-macos-use
2. Register with Claude Code
One line. The -- separator hands everything after it to Claude Code as the spawn argv. Inside a new claude session, run /mcp to confirm the server is up.
# Register the macos-use MCP server with Claude Code: claude mcp add macos-use -- npx -y mcp-server-macos-use # Verify: claude mcp list # > macos-use: npx -y mcp-server-macos-use (stdio) # Open a new claude session. Inside the REPL: > /mcp # > macos-use: connected (9 tools)
3. The TCC trap: grant Accessibility to the terminal, not the MCP binary
This is where every other wiring guide drops the reader. macOS TCC (Transparency, Consent, Control) does not key permission to the literal binary calling AXUIElementCopyAttributeValue. It walks up the responsible-process chain looking for a long-lived parent it can hold accountable. For an MCP server spawned from a terminal, that's the terminal app.
What TCC inspects when the MCP server makes its first AX call
You launch claude from iTerm/Terminal/Ghostty
claude (a node process) spawns mcp-server-macos-use over stdio
stdio pipes, JSON-RPC 2.0. Claude Code owns the child as long as the session lasts.
mcp-server-macos-use calls AXUIElementCopyAttributeValue
TCC intercepts. Walks up the responsible-parent chain looking for who can be held accountable.
TCC blames the long-lived parent: iTerm.app
The Accessibility prompt asks about iTerm, not about mcp-server-macos-use. Grant it there.
So when you trigger the Accessibility prompt, the dialog asks about iTerm.app (or Terminal.app, Ghostty.app, Warp.app, Visual Studio Code.app if you launched Claude Code from VS Code's integrated terminal), not about /usr/local/bin/mcp-server-macos-use.
What gets granted Accessibility
The 'I added it to a config file and the OS asked me to grant Accessibility to the wrong thing' loop
- Granted Accessibility to /usr/local/bin/mcp-server-macos-use directly
- Granted Accessibility to /opt/homebrew/bin/node
- Granted Accessibility to claude itself (the JS binary)
- First click returns errAXAPIDisabled even though everything in System Settings looks toggled on
4. Smoke test
Open a fresh claude session (the TCC posture is captured at process start, so a session that was running before you granted permission won't pick up the change). Then ask Claude Code to open Calculator and click the equals button.
Verify it actually worked
- which claude prints a path. Run `claude mcp list` and confirm macos-use is registered.
- From inside claude: ask it to open Calculator. The first call writes a .txt + .png to /tmp/macos-use/.
- ls -lt /tmp/macos-use/ | head; newest file is named *_open_application_and_traverse.txt.
- head -20 of that file looks like `[AXButton (button)] "=" x:... y:... w:... h:... visible`.
- If you see errAXAPIDisabled in the response: System Settings > Privacy & Security > Accessibility. Toggle your terminal app off and back on.
- Press Esc during any tool call to abort. The InputGuard tap (InputGuard.swift) releases your keyboard within 200ms.
The 9-tool surface
The README highlights five tools. The llms.txt mentions six. The actual allTools array at Sources/MCPServer/main.swift:1482 registers nine. Three of them (set_value, press_ax, set_selected) are the escape hatches for fields that reject synthesized input.
// Sources/MCPServer/main.swift:1482 let allTools = [ openAppTool, // macos-use_open_application_and_traverse clickTool, // macos-use_click_and_traverse typeTool, // macos-use_type_and_traverse pressKeyTool, // macos-use_press_key_and_traverse scrollTool, // macos-use_scroll_and_traverse setValueTool, // macos-use_set_value_and_traverse (AX write, undocumented in README) pressAXTool, // macos-use_press_ax_and_traverse (kAXPressAction, bypasses CGEvent) setSelectedTool, // macos-use_set_selected_and_traverse refreshTool, // macos-use_refresh_traversal ] // 9 entries. Most write-ups quote 5 or 6 because they were // generated against an older README. Three of them (setValue, // pressAX, setSelected) bypass CGEvent and write to the AX // attribute directly, which is the escape hatch for fields // that reject synthesized input (1Password, sudo prompts).
The CGEvent-based tools (click, type, press_key, scroll) are how you drive 95% of UI. They post events to the system event stream so they look like real user input. The AX-direct tools (set_value writes kAXValueAttribute, press_ax calls kAXPressAction) bypass the event stream entirely and target the AX element directly, which works for secure-input fields and accessibility-aware apps that ignore CGEvent.
What happens on a single tool call
Walking the wire on one click_and_traverse call. This is the shape every disruptive tool follows: engage the input guard, read the tree, perform the action, read the tree again, return a diff.
click_and_traverse: claude → MCP server → macOS
Two pieces are doing safety work here that most MCP servers don't bother with. The first is InputGuard: a CGEventTap installed before each disruptive call that drops your hardware keyboard and mouse for the duration of the action. The second is the 30-second watchdog at InputGuard.swift:24 (var watchdogTimeout: TimeInterval = 30), which force-releases the input tap even if the server hangs mid-call. Worst case if the agent goes sideways: 30 seconds and you get your keyboard back. Best case: you press Esc and the in-flight tool call cancels immediately.
What "driving a Mac" looks like in practice
Once the wiring is right, the loop Claude Code runs is straightforward. It asks for the AX tree, picks an element by role and text, calls the appropriate tool, reads the after-tree (or just the diff), decides what to do next. A typical scripted run looks like this:
> claude
> "Open Calendar, find the next Tuesday at 2pm,
and create a 30-minute event titled 'MCP wiring review'."
[macos-use_open_application_and_traverse identifier=Calendar]
→ wrote /tmp/macos-use/1715600000_open_application_and_traverse.txt
→ app Calendar, pid 84112, 412 elements
[macos-use_click_and_traverse pid=84112 element="Day View"]
→ matched [AXRadioButton] "Day View" x:160 y:96 w:80 h:30
→ diff: + 31 elements (today's schedule), ~ 4 elements
[macos-use_click_and_traverse pid=84112 element="forward"]
... advances to next Tuesday
[macos-use_click_and_traverse pid=84112 x:780 y:420 w:120 h:30
text="MCP wiring review"
pressKey="Return"]
→ composed: click + type + press in one call
→ diff: + 1 AXButton "Save", + 1 popoverThe element= parameter is the ergonomic bit: instead of feeding the model exact x/y coordinates, you can ask it to find an element by text. The server does a case-insensitive partial match against AXTitle, AXDescription, and AXValue, optionally filtered by role. First match wins.
Wiring Claude Code into a real macOS workflow?
Walk through your setup live. Twenty minutes, no slide deck.
FAQ
Frequently asked questions
What does 'Claude Code MCP desktop accessibility API' actually wire together?
Four pieces. (1) Claude Code is Anthropic's terminal CLI; it speaks MCP (Model Context Protocol) to local servers over stdio. (2) An MCP server is just a binary that reads JSON-RPC 2.0 from stdin and writes responses to stdout. (3) The macOS Accessibility API is AXUIElement: a read/write tree of every visible UI element in any running app, plus CGEvent for synthesizing mouse and keyboard input. (4) mcp-server-macos-use is a Swift binary that bridges 2 and 3: it exposes 9 MCP tools (registered in the allTools array at Sources/MCPServer/main.swift:1482) that read the AX tree of a target app, post CGEvent inputs, and return the post-action tree. Wiring all four together gives Claude Code real hands on any macOS app, no screenshot OCR or pixel matching required.
Why does Accessibility get granted to my terminal app and not to the MCP server binary?
macOS TCC (Transparency, Consent, Control) keys Accessibility permission on the *responsible* process for a request, not the literal calling executable. When mcp-server-macos-use calls AXUIElementCopyAttributeValue, the kernel asks 'who is this child accountable to?' and walks up the parent chain. It finds claude (a node process), then the shell, then the terminal app that owns the PTY. The terminal is the long-lived bundle-id'd application TCC has on file, so the Accessibility prompt asks about iTerm/Terminal/Ghostty/VS Code, not about /usr/local/bin/mcp-server-macos-use. Symptom of getting this wrong: you toggle the MCP binary on in System Settings, restart, and the first AX call still returns errAXAPIDisabled. Fix: revoke the binary, grant your terminal app, restart the terminal (TCC caches the decision per-PID).
Is `claude mcp add macos-use -- npx -y mcp-server-macos-use` the canonical install?
Yes. The `--` separator hands everything after to the spawn args, so claude will launch `npx -y mcp-server-macos-use` over stdio when the server is requested. The `-y` on npx auto-accepts the install prompt. The npm package's postinstall hook runs `xcrun swift build -c release` to produce the native binary, so the only host requirement is the Xcode Command Line Tools (no full Xcode needed). If you'd rather not have npm in the path, clone the repo, `swift build -c release`, and pass an explicit path: `claude mcp add macos-use -- /Users/you/mcp-server-macos-use/.build/release/mcp-server-macos-use`.
How many tools does the server actually expose, and why do other guides quote different numbers?
Nine, at Sources/MCPServer/main.swift:1482. The full set: open_application_and_traverse, click_and_traverse, type_and_traverse, press_key_and_traverse, scroll_and_traverse, set_value_and_traverse, press_ax_and_traverse, set_selected_and_traverse, refresh_traversal. The README highlights five; the llms.txt mentions six. Three of the nine (set_value, press_ax, set_selected) are escape hatches that write AX attributes directly instead of going through CGEvent. They land on fields that reject synthesized input, such as secure-input password fields, sudo prompts, and 1Password unlock, where CGEvent-based clicks and types silently no-op. If you've ever had a 'the click happened, the field is still empty' bug, that's the one. press_ax_and_traverse calls kAXPressAction directly on the element, which the system accepts because it looks like the app calling itself.
What does a single tool call actually do under the hood?
Take macos-use_click_and_traverse as the canonical shape. (1) The handler engages InputGuard, which installs a head-insert CGEventTap that drops hardware keyboard and mouse events for the duration of the call (Esc, plain, with no modifiers, is the one exception; it cancels the call). (2) It reads the target app's AX tree once for the 'before' state. (3) It posts a CGEvent left-mouse-down + left-mouse-up at (x + w/2, y + h/2), centering on whatever element you asked for. (4) It reads the AX tree again for the 'after' state. (5) It diffs the two trees and returns a compact summary plus a file path. The full traversal lands as a text file under /tmp/macos-use/, one line per element, `[AXRole] "label" x:N y:N w:W h:H visible`. The compact summary is what the LLM sees in its context window; the file is what you grep when something goes wrong.
What stops a runaway agent from typing forever into the wrong window?
Two things. First, every disruptive tool call engages InputGuard before posting any CGEvent. The guard is a 30-second watchdog (InputGuard.swift, line 24: `var watchdogTimeout: TimeInterval = 30`) that force-disengages the input tap even if the server hangs mid-call. Second, the guard listens for plain Esc with no modifiers, and if you press it the in-flight tool call throws InputGuardCancelled and returns control to you. The cursor position is also saved before each action and restored after, so the visible mouse pointer doesn't jump around your screen.
Does this work with Cursor, Claude Desktop, VS Code, or only Claude Code?
Any MCP-compatible client over stdio. Claude Desktop, Cursor, VS Code (with Copilot's MCP host), Cline, Windsurf; they all speak the same protocol. The config shape differs (Claude Desktop wants the `mcpServers` block in `claude_desktop_config.json`; Claude Code wants `claude mcp add`; Cursor wants its own settings.json entry), but the server binary is the same. The TCC trap is the same on every host: grant Accessibility to whichever app spawned the MCP server, not to the server binary itself. If you launch the host from Spotlight that's the host's bundle ID; if you launch the host from a terminal that's the terminal's bundle ID.
What can I actually have Claude Code do once this is wired up?
Any app the Accessibility API can see, which is most of them, with the exception of apps that explicitly opt out (a small list, mostly DRM-protected video players). Concrete examples: 'open Calendar, find next Tuesday 2pm, create an event titled X' (open_application_and_traverse + click + type + press Return), 'in Finder, navigate to ~/Downloads and right-click the first PDF' (open + click with rightClick: true), 'find the unread Slack message in the General channel and read it' (click + refresh_traversal to read the new pane). The set_value tool gives you a write primitive that bypasses CGEvent, which is what you reach for when an app has a text field that rejects synthesized typing.
Where do the response files live and how do I read them?
Every tool call writes a flat text file to /tmp/macos-use/<timestamp>_<tool>.txt and (for disruptive actions) a PNG screenshot of the target window at the same prefix. The text file is one element per line: `[AXButton (button)] "Submit" x:680 y:520 w:80 h:30 visible`. Diff lines (for the after-state tree) use `+` (added), `-` (removed), `~` (modified) prefixes. Don't read the full file into an LLM's context; grep for the element you care about: `grep -n "AXTextField" /tmp/macos-use/*_click_and_traverse.txt`. The MCP summary returned to the model includes counts and up to three notable text changes, which is enough for the model to decide what to do next.
What's the license and where does the source live?
Open source at github.com/mediar-ai/mcp-server-macos-use under BSL 1.1 (Business Source License 1.1; converts to a permissive open-source license after a fixed period). Written in Swift: 2056 lines in Sources/MCPServer/main.swift (the JSON-RPC handler, tool registry, AX traversal, CGEvent injection) and 355 lines in Sources/MCPServer/InputGuard.swift (the CGEventTap input arbiter and watchdog). Distributed via npm so the install is one command and the postinstall hook builds the Swift release binary against your local Xcode toolchain.
More on the macOS MCP stack
macOS MCP server: the full tool inventory
All nine tools (open, click, type, press_key, scroll, set_value, press_ax, set_selected, refresh) and when each one is the right hammer.
Claude Code on a launchd schedule
Once the MCP server is wired, scheduling Claude Code at 9am to summarise Mail and draft a Notes doc is a 30-line LaunchAgent away. The TCC inheritance trap reappears here, slightly different.
macOS accessibility automation: the return-shape problem
AXUIElement is the easy half. The harder half in 2026 is what you return to an LLM after each action, and why macos-use ships a separate 111-line screenshot subprocess to keep ReplayKit out of the main process.
Comments (••)
Leave a comment to see what others are saying.Public and anonymous. No signup.