macOS MCP Server: Why The One Worth Installing Has Four Click Primitives, Not One
An MCP server that drives macOS exists in two shapes. The kind that posts a synthetic CGEvent click and quietly fails on Notes, Reminders, App Store, Music, and Messages, and the kind that knows Catalyst apps drop those clicks and exposes three more primitives to escalate. macos-use is the second kind. Nine MCP tools, four of them ways to actuate one element. This page is a tour of what each primitive is for, with file and line references you can verify in the source.
Direct answer · verified 2026-05-03
macos-use is the open source MCP server at github.com/mediar-ai/mcp-server-macos-use that drives macOS apps via the native Accessibility APIs. MIT licensed, written in Swift, ships nine tools to any MCP-compatible client.
claude mcp add macos-use -- npx -y mcp-server-macos-useTool inventory · main.swift:1482
- open_application_and_traverse
- click_and_traverse
- type_and_traverse
- press_key_and_traverse
- scroll_and_traverse
- press_ax_and_traverse
- set_value_and_traverse
- set_selected_and_traverse
- refresh_traversal
The three teal entries are the AX-tier escalation primitives this page is about.
The Catalyst Problem In One Paragraph
Mac Catalyst is the bridge that lets iPad apps run on macOS. It was extended over the past five years to ship every Apple stock app you now think of as "Mac native": Notes, Reminders, App Store, Music, Stocks, Maps, Messages on recent releases. Catalyst windows host an AppKit shell around a UIKit view hierarchy. The accessibility tree exposes both layers, but the input pipeline is asymmetric. A synthetic CGEvent.post mouse-down on a control inside the UIKit pane often looks like it landed (the cursor flickers, the system reports success) but no AX state changes. The button never registered the click.
This breaks every macOS automation tool that ships a single click primitive: the same servers that work flawlessly on Slack and Cursor get stuck the first time the agent tries to open a note or install an app. The naive fix is "retry harder", which does not work because the click is not failing in a recoverable way; it is being dropped.
macos-use sidesteps the problem by exposing the lower-level AXUIElement actions as separate MCP tools. When the model sees an empty diff after a click, the next call it makes is press_ax_and_traverse with the same coordinates. If that returns kAXErrorActionUnsupported, the next call is set_selected_and_traverse. The model picks up an implicit per-app policy across a session. The cost is one extra tool call per dropped action; the benefit is the agent never gets stuck.
One macOS MCP Server vs. The Common Pattern
Each row is a behavior an agent will hit during a real session on a real Mac. Each cell is what the response looks like to the model.
| Feature | single-click-primitive servers | macos-use |
|---|---|---|
| Click primitive count | 1 (synthetic CGEvent only) | 4 (CGEvent + kAXPressAction + kAXValueAttribute + kAXSelectedAttribute) |
| Behavior on Catalyst right-pane controls | click reports success, AX diff is empty, agent loops | agent escalates to press_ax or set_selected, gets a real diff |
| Typing into sandboxed/secure-input fields | key events dropped silently, value never lands | set_value bypasses the event tap, writes via kAXValueAttribute |
| Cross-process dialog handoff (Save Panel, Share, Print) | returns the original app's tree, agent never sees the dialog | compares frontmost PID after every action, returns BOTH trees |
| Input arbitration while the agent is acting | your keystrokes race the agent's into the same field | head-insert CGEventTap drops hardware events, Esc is the kill switch |
| Cursor + frontmost-app restore after the call | you find the cursor in a different corner, focus on a random app | NSEvent.mouseLocation saved at main.swift:1672, replayed at 1767-1771 |
| Traversal output format | JSON blob inlined into the MCP response, blows your context | flat text file at /tmp/macos-use/<ts>_<tool>.txt, agent greps it |
The Four Primitives, In Source Order
Tool name, defined at, dispatched at, what it actually does.
1. Try the cheap path: click_and_traverse
Standard synthetic mouse click. Activates the target PID via NSRunningApplication.activate, scrolls the element into view if it is offscreen, posts left-mouse-down + left-mouse-up CGEvents at the element's center, traverses again. Defined at main.swift:1349; handler at main.swift:1640-1684. Works for native AppKit apps, Electron apps, and most Catalyst left-pane sidebar items. Ninety percent of the time, this is the only primitive you need.
Failure signature: the response comes back with diffSummary: 0 added, 0 removed, 0 modified even though the click landed visually. The agent sees no change in the tree. Time to escalate.
2. Skip the event tap: press_ax_and_traverse
Calls AXUIElementPerformAction with kAXPressAction on the element directly. No CGEvent.post, no event tap, no synthesis. The accessibility framework dispatches the action straight into the target process. Defined at main.swift:1457; handler at main.swift:1748-1765. This is the right primitive for Catalyst right-pane buttons that the synthetic click path silently swallowed. Works for any element whose AX tree exposes an AXPress action.
Failure signature: kAXErrorActionUnsupported. The element does not advertise AXPress in its AXActionNames. Common on table rows and sidebar entries: they care about selection state, not press events. Escalate again.
3. Toggle the selection bit: set_selected_and_traverse
Sets kAXSelectedAttribute on the element via AXUIElementSetAttributeValue. Defined at main.swift:1475; handler at main.swift:1767-1785. The tool description names the exact failure mode it exists for: "selection-bearing controls that expose AXSelected but no AXPress action (where regular click is dropped and press_ax errors with kAXErrorActionUnsupported)." This is your only path for Catalyst table rows, outline view rows, and a fair number of NSCollectionView items in third-party apps.
Side note: this primitive accepts a selected: false argument too, so the same call shape lets you deselect a row that the previous tool call selected. Symmetric primitive, single tool.
4. Write the value: set_value_and_traverse
Writes a string into the element via AXUIElementSetAttributeValue with kAXValueAttribute. Defined at main.swift:1440; handler at main.swift:1728-1746. Use when typing fails: Catalyst right-pane text fields drop synthesized keystrokes, sandboxed contexts with secure-input mode swallow CGEvent key events, and a few apps simply do not accept key events from a non-foreground source. This primitive bypasses the input event tap entirely; the value lands directly on the AX element.
The accompanying type_and_traverse at main.swift:1369 is the synthetic-key-events path; set_value_and_traverse is the AX write path. Same goal, two routes, picked by the model based on which one yielded a diff in this app last time.
The Escalation Loop, As The Agent Sees It
One trail through the four primitives on a Catalyst app. The model reads diffs, picks the next tool, and stops as soon as the tree changes.
Catalyst right-pane click escalation
Plan target
agent picks (pid, x, y) from grep on the AX file
click_and_traverse
synthetic CGEvent click
Empty diff?
Catalyst dropped it
press_ax_and_traverse
kAXPressAction on element
Action unsupported?
selection-bearing control
set_selected_and_traverse
kAXSelectedAttribute toggle
The Same Click, Two Wire-Level Paths
Left tab is what every other macOS MCP server does. Right tab is the AX-tier primitive macos-use exposes when the left tab returns an empty diff.
Catalyst right-pane button
// macos-use_click_and_traverse → synthetic mouse click
// Path used by every other macOS MCP server, no fallback.
// On a Catalyst right-pane button, the next AX diff is empty.
// 1. Activate target app so the click registers
NSRunningApplication(processIdentifier: pid)?.activate(options: [])
try? await Task.sleep(nanoseconds: 200_000_000) // 200ms
// 2. Build the click point (top-left + center if width/height given)
let hitPoint = CGPoint(x: x + width / 2, y: y + height / 2)
// 3. Post the synthetic events through the input tap
let down = CGEvent(mouseEventSource: .hidSystemState,
mouseType: .leftMouseDown,
mouseCursorPosition: hitPoint,
mouseButton: .left)
let up = CGEvent(mouseEventSource: .hidSystemState,
mouseType: .leftMouseUp,
mouseCursorPosition: hitPoint,
mouseButton: .left)
down?.post(tap: .cghidEventTap)
up?.post(tap: .cghidEventTap)
// 4. Re-traverse and diff. On a Catalyst right-pane button:
// diffSummary: "0 added, 0 removed, 0 modified"
// The button "looked" pressed; nothing changed.The Same Failure, Two Outcomes
What happens to the agent on a Reminders right-pane checkbox, with and without the AX-tier escalation tools.
Reminders right-pane checkbox at (412, 248)
Agent calls click_and_traverse on a Reminders right-pane checkbox at (412, 248). The synthetic CGEvent posts; visually a tiny cursor flicker happens. The post-action AX traversal returns the same tree as the pre-action traversal: the checkbox AXValue is still 0, no panels appeared, no text changed. The agent's diff is empty. The agent has no signal that anything went wrong, so it tries the same click again. Same empty diff. After three attempts the agent gives up and tells the user it cannot complete the action.
- Three click_and_traverse calls, three empty diffs
- Agent has no signal the click is being dropped
- Loop ends with 'cannot complete the action'
- User is told the agent failed; the API was actually wrong
The Wire Sequence
Two MCP calls, four hops each. The first call dies in the Catalyst app and reports back an empty diff. The second call routes around the input event tap and lands on the element via the accessibility framework directly.
click + escalation on a Catalyst app
Where Each Primitive Earns Its Place
Apps the maintainers have reproduced primitive-drops in. Not an exhaustive list. The pattern: anything Catalyst-derived needs at least one AX-tier primitive; anything pure AppKit or Electron usually does not.
Notes
Catalyst, right-pane buttons need press_ax
Reminders
Catalyst, list rows need set_selected
App Store
Catalyst, install buttons need press_ax
Music
Catalyst, sidebar items need set_selected
Messages
Catalyst on recent macOS, fields need set_value
Slack
Electron, click_and_traverse works directly
Cursor
Electron, click_and_traverse works directly
AppKit, all primitives available
Calendar
AppKit, all primitives available
Finder
AppKit, click_and_traverse works directly
Numbers
AppKit, opens cross-process Save Panel
Xcode
AppKit, all primitives available
Wire It Up In Four Steps
Claude Code path. For Cursor, Claude Desktop, VS Code, or Cline, point the client at npx -y mcp-server-macos-use with the same args shape.
- 1
Install via npm
claude mcp add macos-use -- npx -y mcp-server-macos-use
- 2
Grant Accessibility
First tool call triggers a system prompt. Allow.
- 3
Open Calculator
Use open_application_and_traverse on Calculator as a smoke test.
- 4
Watch the .txt file
tail /tmp/macos-use/*_open_application_and_traverse.txt
Other Files Worth Reading In The Same Repo
Six anchors in the source if you are evaluating this server for production use. Each one is a behavior the AI client will hit on day one.
- [1]scripts/test_mcp.py is the wire-format smoke test (initialize → listTools → openApp → exit)
- [2]main.swift:1408 lists every CallTool case the server handles
- [3]main.swift:1539 is the listTools handler that advertises the nine tools
- [4]InputGuard.swift:113 installs the head-insert CGEventTap, InputGuard.swift:24 caps it at 30s
- [5]main.swift:1786-1809 detects cross-process dialog handoffs (Save Panel, Share, Print)
- [6]main.swift:1837 makes the screenshot follow the new frontmost PID after a handoff
“The button looked pressed; nothing changed. That is the failure mode every server with a single click primitive hits on Catalyst, and the reason this server has four.”
main.swift:1482
Verify Every Claim On This Page Yourself
Seven steps from clone to confirmed. Faith optional.
Confirm the four-primitive design exists in the source
- git clone https://github.com/mediar-ai/mcp-server-macos-use
- swift build -c release in the repo root
- grep -n 'name:' Sources/MCPServer/main.swift to count Tool() declarations
- Confirm the count is 9 and main.swift:1482 lists them in allTools
- Read the description string at main.swift:1477 — it names AXSelected and kAXErrorActionUnsupported
- Read main.swift:1728-1785 for the three escalation primitive handlers
- Read InputGuard.swift:24, 113, 131, 329-351 for the input-arbitration layer
More on the same source tree
macOS Accessibility Tree Automation
The control-arbitration problem every other guide skips: who owns the keyboard while the LLM is clicking. CGEventTap, eventSourceStateID, plain Esc.
MCP Server As A Desktop App
The cross-process Save Dialog problem and why macos-use returns two AX trees in one response. main.swift:1786-1809.
What Is An MCP Server
The part the spec skips: sharing your keyboard with the model. InputGuard.swift, 355 lines of what the protocol does not specify.
Putting an LLM on a Mac with Catalyst apps in the loop?
Walk through which of your target apps need AX-tier escalation with the maintainers, and pressure-test the primitive set against your workflow.
Frequently asked questions
What is the macOS MCP server, in one sentence?
A local Model Context Protocol server (mediar-ai/mcp-server-macos-use, MIT, written in Swift) that exposes nine tools to AI clients (Claude Code, Cursor, Claude Desktop, Cline, VS Code) over JSON-RPC 2.0 stdio, drives any macOS app via the native AXUIElement APIs, and ships an in-process kernel-level CGEventTap that swallows your hardware input while it is acting so the agent and you do not race the same keyboard. Source: Sources/MCPServer/main.swift (2056 lines) and Sources/MCPServer/InputGuard.swift (355 lines).
How is it different from posting a synthetic CGEvent click like the other macOS MCP servers?
It still posts a synthetic CGEvent for the common path (macos-use_click_and_traverse at main.swift:1349 ends in CGEvent.post via the input event tap). The difference is what happens when that path fails. Catalyst-derived apps (Notes, Reminders, App Store, Music, Messages on recent macOS, and a long tail of third-party Mac Catalyst apps) frequently swallow synthetic mouse events on their right-pane controls, returning a click to the screen but no diff in the AX tree. macos-use detects the empty diff and gives the agent three escalation primitives so the next tool call can retry without the agent inventing the strategy on its own.
What are the three escalation primitives?
macos-use_press_ax_and_traverse at main.swift:1457 calls AXUIElementPerformAction with kAXPressAction directly on the element under (x, y), bypassing the input event tap. macos-use_set_value_and_traverse at main.swift:1440 writes a string into the element via kAXValueAttribute, used when typing into Catalyst right-pane fields where simulated key events are dropped. macos-use_set_selected_and_traverse at main.swift:1475 toggles kAXSelectedAttribute on selection-bearing controls (Catalyst table rows, sidebar list entries, outline rows) where regular click is dropped AND press_ax errors with kAXErrorActionUnsupported. All three accept the same (pid, x, y, width?, height?) shape so the agent can swap primitives without re-traversing.
Why are these separate MCP tools instead of one click tool that retries internally?
Because the model is the planner and the model needs to know what was tried. If the server retried silently and reported back "clicked", the model would have no way to learn which primitive worked for a given app. The MCP contract is one tool call, one outcome, attributed. By exposing four primitives, macos-use lets the model build an implicit per-app policy across a session: this app responds to click, that app needs press_ax, this third one only moves on set_selected. The cost is one extra tool call per dropped action; the benefit is the model's belief about UI affordances stays grounded.
How do I install the macOS MCP server in Claude Code?
One command: `claude mcp add macos-use -- npx -y mcp-server-macos-use`. The npm postinstall hook (package.json line 17) runs `xcrun swift build -c release` against the bundled Swift package, so the only host requirement is the Xcode Command Line Tools. After install, grant Accessibility permission to the Claude Code binary (System Settings > Privacy & Security > Accessibility) the first time a tool call fires; you will see a system prompt because every CGEvent.post call from the server is mediated by your Accessibility allowlist.
How do I install it in Cursor, Claude Desktop, VS Code, or Cline?
Add a server entry to the client's MCP config pointing at `npx -y mcp-server-macos-use`. The exact JSON shape is in the Cursor `~/.cursor/mcp.json`, Claude Desktop `~/Library/Application Support/Claude/claude_desktop_config.json`, and Cline VS Code extension settings. Each one uses the same `command: "npx"`, `args: ["-y", "mcp-server-macos-use"]` pattern. Restart the client. The server registers via the listTools handler at main.swift:1539 and your nine tools appear in the client UI.
What does "and_traverse" mean in every tool name?
Every action tool returns a fresh accessibility-tree traversal of the target app immediately after the action. The traversal is written to /tmp/macos-use/<timestamp>_<tool>.txt as a flat, grep-able text file (one line per AX element, format `[Role] "text" x:N y:N w:W h:H visible`). The MCP response carries the file path plus a sampled visible_elements block, and the agent is expected to grep the file for its next target. This is built into the server contract because the alternative (asking the model to hold the entire AX tree in context) is wasteful: real apps emit 800-3000 elements per traversal.
Why does the server take over my keyboard during a tool call?
Because a local MCP server runs on your Mac, not somewhere in a datacenter. While the agent is mid-700ms type_and_traverse on Slack, your hand drifting back to the keyboard for a Cmd-K races the agent into the same focused field. macos-use installs a head-insert CGEventTap at InputGuard.swift:131 around every disruptive call. The tap separates programmatic events from hardware events by eventSourceStateID (programmatic non-zero, hardware zero, checked at InputGuard.swift:329-332) and drops everything coming from your keyboard except plain Esc (keycode 53, no modifiers, InputGuard.swift:340-351) which is the one-key abort. A 30-second watchdog at InputGuard.swift:24 force-disengages the tap so the machine cannot get stuck guarded.
How do I use it without an LLM, just to test it?
Build with `swift build -c release` from the repo root, then run `./.build/release/mcp-server-macos-use` and pipe MCP JSON-RPC envelopes at it on stdin. The repo includes scripts/test_mcp.py which is the smallest end-to-end smoke test: it sends initialize, listTools, and a single open_application_and_traverse call to Calculator. Use it as a reference for the wire format if you are integrating from a custom client. The server logs to stderr with `log:` and `warning:` prefixes, so the test script can pipe stdout to JSON parsing while leaving stderr human-readable.
Does it support remote macOS control like the baryhuang/mcp-remote-macos-use project?
No. macos-use is local: the MCP server runs on the same machine whose UI it is driving. Remote control is a different design (VNC or screen-share + remote synthetic events) with different trade-offs around latency, security, and what you do when the remote session loses focus. macos-use trades remote reach for tighter integration: native AX tree access, four-primitive escalation, in-process input guard, and saved/restored frontmost-app + cursor-position invariants around every call. If you need a Mac you are not sitting at, those projects are right; if you need an agent on the Mac you actually use, this is right.
Is it open source and what is the license?
Yes, MIT licensed. Repo: github.com/mediar-ai/mcp-server-macos-use. The Swift package depends on the Swift MCP SDK and a small internal MacosUseSDK module. The npm wrapper at npmjs.com/package/mcp-server-macos-use ships the Swift sources and rebuilds them at install time via the postinstall hook so you always get a binary linked against your local Xcode toolchain.
Comments (••)
Leave a comment to see what others are saying.Public and anonymous. No signup.