MCP server + macOS accessibility APIs: what discovery actually returns, and where it stops being enough
Discovery via AXUIElement is total. You get every node of every running app the host can read. The interesting part is everything that happens after discovery: noise filtering, three actuation paths, and one toolkit (Mac Catalyst) where the obvious click silently fails.
Direct answer (verified 2026-05-17)
An MCP server that wraps the macOS Accessibility API can discover every node of every running app the host process has permission to read: role, text, position, size, AX actions, and attributes. Discovery is total. The limit is actuation. On Mac Catalyst right-pane controls and some sandboxed apps a synthetic CGEvent click on a perfectly discovered element is silently dropped, which is why the macos-use server registers 9 tools at Sources/MCPServer/main.swift:1482 and 3 of those 9 are pure actuation fallbacks: set_value (kAXValueAttribute), press_ax (kAXPressAction), set_selected (kAXSelectedAttribute).
API reference: developer.apple.com / AXUIElement
What a single line of discovery output looks like
Every MCP call writes a flat text dump under /tmp/macos-use/ with one element per line in the format [Role] "text" x:N y:N w:W h:H visible. You read it with grep. Here is what discovery returns when you open Calculator and search for buttons:
The format is produced by formatElementLine at main.swift:993-1010. The same shape is what every other tool in the server returns: it is what the agent reads, greps, and acts on.
What discovery contains
present for every node the AX tree exposes
- role (AXButton, AXTextField, AXStaticText, AXRow, AXWindow, AXMenu...)
- text from AXTitle, AXValue, AXDescription, AXLabel
- top-left x and y in global screen coordinates
- width and height in points
- list of AX actions advertised (kAXPressAction, kAXShowMenuAction)
- AXEnabled, AXFocused, AXSelected, AXValue attributes
- in_viewport flag (does it fall inside any window of the app)
What it does NOT contain
These are the genuine blind spots. None of them are server bugs; the app never published the data, so no tree walker can find it.
not in the AX tree
- anything inside Metal-only game engines or DRM video players (one giant AXWindow, no children)
- shapes inside custom NSView canvases that do not publish per-shape AXGroups
- data the app is fetching, network calls, anything not rendered in visible UI
- another running app's tree unless you walk its AX root by PID separately
- DOM detail inside WKWebView beyond what VoiceOver would announce
- CSS-styling-only state changes (color, hover, transitions)
“9 tools registered. 3 of them exist only because a CGEvent click on a perfectly discovered Catalyst button is silently dropped. press_ax, set_value, and set_selected each carry the same hint verbatim: 'Use when a synthetic mouse click is dropped.'”
Sources/MCPServer/main.swift:1440-1482
The same tree looks different in four toolkits
A Mac app is not one thing. The toolkit it was built in changes what discovery returns and which actuation path works. Same APIs, four different lived experiences.
AppKit
Mail, Finder, the legacy Notes. Tight tree, real titles on AXButton, kAXPressAction works, kAXValueAttribute readable and writable. Click via CGEvent almost always actuates. The cleanest case.
SwiftUI
Similar shape to AppKit but 1.5x to 3x more nodes because every view modifier becomes a wrapper. Most actions still work; tree dumps are bigger so use grep more aggressively.
Mac Catalyst
Messages, Maps, parts of newer System Settings. Tree looks right. CGEvent clicks on right-pane controls are silently dropped. This is the case the 3 fallback tools (press_ax, set_value, set_selected) exist for.
Electron
Slack, Discord, VS Code. Tree is huge, thousands of AXGroup and AXStaticText nodes from the Chromium accessibility shim. Semantic roles washed out, but text is still grep-able and CGEvent clicks usually work.
The accent card is Mac Catalyst because it is the one where discovery and actuation diverge most. The three fallback tools (press_ax, set_value, set_selected) were added in direct response to debugging Messages and Maps where click_and_traverse returned success but nothing changed in the AX tree.
What the server filters out of the diff
The full per-call dump under /tmp/macos-use contains everything. The delta returned to the model after an action is filtered, so the LLM does not waste tokens on noise. Two named filters live in the server, both inside main.swift:
isScrollBarNoise (main.swift:591-597)
drops any role containing "scrollbar", "scroll bar", "value indicator", "page button", or "arrow button".
isStructuralNoise (main.swift:599-607)
drops empty (no text) AXRow, AXOutline row, AXCell, AXColumn, and AXMenu containers.
Coordinate-only attribute changes are also stripped at main.swift:681-682 so a window repaint that nudges every element 1px does not flood the diff. The full file is still on disk if the agent wants to compare. These are deliberate design choices around what an LLM should and should not be asked to reason about after each action.
The actuation ladder, in order
For any element the agent has already discovered, the correct ordering when the obvious path fails is. The lookup table below maps each rung to where it lives in source, when to use it, and the failure signature that tells you to drop down one more rung.
| Feature | when to use | failure signature → next rung |
|---|---|---|
| click_and_traverse (main.swift:1349) | AppKit, SwiftUI, Electron, most of Catalyst. Synthetic CGEvent left-mouse-down/up at (x+w/2, y+h/2). The first attempt for 90% of elements. | diff returns 0 added / 0 removed / 0 modified after the click landed visually. Drop to press_ax. |
| press_ax_and_traverse (main.swift:1457-1461) | Catalyst right-pane buttons. Skips the input event tap, calls kAXPressAction directly on the AXUIElement. | kAXErrorActionUnsupported (element exposes no AXPress action, common on table rows). Drop to set_selected. |
| set_selected_and_traverse (main.swift:1475-1479) | Sidebar entries, outline rows, table rows that expose AXSelected but no AXPress action. | element has no kAXSelectedAttribute at all (read-only or non-selectable container). Inspect the dump. |
| set_value_and_traverse (main.swift:1440-1444) | Catalyst right-pane text fields, sandboxed secure-input contexts. Writes via kAXValueAttribute, no keystrokes. | AXErrorAttributeUnsupported, or the field silently rejects the write (rare; some search fields are like this). |
- click_and_traverse: synthesize a CGEvent left-mouse-down/up at (x+w/2, y+h/2). Works on AppKit, SwiftUI, Electron, and most of Catalyst. Almost always the first attempt.
- press_ax_and_traverse (
main.swift:1457-1461): calls kAXPressAction directly on the AXUIElement at (x,y). This is what works when the synthetic click was dropped silently on a Catalyst button. The element actuates without ever touching the input event tap. - set_selected_and_traverse (
main.swift:1475-1479): for sidebar entries, outline rows, and table rows that expose AXSelected but no AXPress action. Calling press_ax on these returnskAXErrorActionUnsupported. Setting the attribute selects the row instead, which is what a user clicking would have done. - set_value_and_traverse (
main.swift:1440-1444): when typing into a text field via CGEvent fails (Catalyst right-pane fields, sandboxed secure-input contexts), write the string directly with kAXValueAttribute. The field updates without any keystrokes ever leaving the input layer.
The order matters. Click first because it is what a human would do and it works almost everywhere. Drop down the ladder only when the AX tree shows the click did not change anything (and the screenshot confirms it). A robust agent treats these as four primitives in one decision tree, not as four unrelated tools.
Try the fallback tools yourself
Install macos-use into your MCP client, grant the terminal Accessibility + Screen Recording, then ask Claude Code to open Messages and click a thread. Watch the agent escalate from click_and_traverse to press_ax_and_traverse the first time a Catalyst right-pane click comes back with an empty diff. It is the smallest reproducible test of the actuation ladder.
claude mcp add macos-use -- npx -y mcp-server-macos-usenpm i -g @anthropic-ai/claude-code) and macOS 13+. Swift builds on first run, ~20 seconds.Server source: github.com/mediar-ai/mcp-server-macos-use. The line numbers cited on this page are stable to Sources/MCPServer/main.swift at the commit currently published to npm.
Building an MCP server against macOS apps and hitting the actuation wall?
If you are wiring Claude Code or Cursor against a Catalyst app and clicks are landing nowhere, a 20-minute call usually unblocks it.
Frequently asked questions
What does an MCP server using macOS accessibility APIs actually discover?
Every node of every running app's AXUIElement tree, for any app the host process has Accessibility permission for in System Settings. For each node you get: role (AXButton, AXTextField, AXStaticText, AXRow, AXWindow, AXMenu, etc.), the element's text where present (AXTitle, AXValue, AXDescription, AXLabel), top-left x/y in screen coordinates, width and height, the list of AX actions the element advertises (kAXPressAction, kAXShowMenuAction, kAXCancelAction), and the AX attributes (AXEnabled, AXFocused, AXSelected, AXValue). The macos-use server flattens that tree to one-element-per-line text and writes it to /tmp/macos-use/<ts>_<tool>.txt every call. Discovery is total. You are not crawling pixels.
Where does discovery break down in practice?
Discovery itself does not break down. What breaks is the assumption that a discovered element is actuatable through the obvious path. Three known classes of failure show up. (1) Mac Catalyst right-pane controls (Messages, Maps, parts of System Settings) frequently expose AXButton with sensible text and coordinates, but a CGEvent left-mouse-down/up at the element's center is silently dropped. The element does not press, no error fires, the AX tree does not change. (2) Sandboxed apps with secure-input contexts ignore synthetic CGEvent typing. (3) Some Catalyst list rows and outline rows expose AXSelected but no AXPress action, so calling kAXPressAction on them returns kAXErrorActionUnsupported. None of these are discovery problems. The tree is correct. The default actuation path just is not the one that works.
Why does the macos-use server register 9 tools instead of 5?
Because of the actuation gaps above. The aggregate list at Sources/MCPServer/main.swift:1482 is open_application_and_traverse, click_and_traverse, type_and_traverse, press_key_and_traverse, scroll_and_traverse, set_value_and_traverse, press_ax_and_traverse, set_selected_and_traverse, refresh_traversal. Six of those are the standard surface: discover, click, type, press key, scroll, refresh. The other three are alternate actuators. set_value writes a string via kAXValueAttribute, bypassing the input event tap entirely (main.swift:1440-1444). press_ax performs kAXPressAction on the element under (x,y) instead of synthesizing a click (main.swift:1457-1461). set_selected sets kAXSelectedAttribute on table rows and sidebar entries that have no AXPress action (main.swift:1475-1479). All three carry the same hint verbatim: 'Use when a synthetic mouse click is dropped (Catalyst right-pane buttons, sandboxed apps).' The agent's plan is: try click first, fall back to press_ax, fall back to set_selected for selection-bearing controls.
What does the server filter OUT of the discovered tree?
Two named classes, and only from the diff response, not from the full text dump. isScrollBarNoise at main.swift:591-597 drops any role containing 'scrollbar', 'scroll bar', 'value indicator', 'page button', or 'arrow button'. isStructuralNoise at main.swift:599-607 drops empty (no text) AXRow, AXOutline row, AXCell, AXColumn, and AXMenu containers. Coordinate-only attribute changes are also stripped (main.swift:681-682) so that a diff between two traversals does not flag every pixel-level repaint. The full traversal .txt file under /tmp/macos-use still contains everything. The filter exists so an LLM consuming the post-action diff sees only changes worth acting on, not 'AXValueIndicator moved 2 pixels.'
How does discovery behave across AppKit, SwiftUI, Mac Catalyst, and Electron?
The tree is always there; the quality differs. AppKit (Mail, Finder, the legacy Notes) emits a tight tree: AXButton with a real title, AXTextField with kAXValueAttribute that you can both read and write, AXMenuItem with a working kAXPressAction. SwiftUI emits a similar shape but more verbose because every modifier becomes a wrapper node; element counts are 1.5x to 3x what AppKit would produce for the same view. Mac Catalyst (UIKit on Mac, e.g. Messages, Maps, parts of newer System Settings) emits a tree that looks correct but is the one where synthetic CGEvent clicks fail most often; the fallback tools were written specifically for this. Electron (Slack, Discord, VS Code) emits thousands of AXGroup and AXStaticText nodes from the Chromium accessibility shim. The tree is huge, semantic roles are washed out, but text is still extractable and CGEvent clicks usually work.
What permission does the host process need for any of this to work?
Two grants, in System Settings > Privacy & Security. (1) Accessibility, granted to the process that loads the MCP server. On Claude Code that is the terminal app (iTerm, Terminal, Ghostty), not the MCP binary; the binary inherits the grant from its parent. Grant it to the wrong process and AXUIElementCreateApplication returns -25204 on every call. (2) Screen Recording, granted to the same process. CGWindowListCreateImage on Sequoia returns blank or menu-bar-only frames without it, so the per-action screenshot the server attaches is empty. Both grants are checked once per macOS login session and cached.
Are there things the AX tree just does not contain at all?
Yes. Three categories. (1) Apps that opt out: some Metal-only game engines and DRM video players publish a single AXWindow with no children, because they bypass AppKit entirely. (2) Custom-drawn canvas that does not implement accessibility: a CAD tool drawing in a single NSView without exposing per-shape AXGroups produces one giant node. (3) Cross-process IPC state: the AX tree shows the visible UI of the focused process, not the data the app is fetching, not its network calls, not what is in another app's window unless you walk that app's AX root by PID. Anything that is not visible UI is not in the tree. For those cases the agent has to fall back to coordinate-driven clicks against the screenshot, or to a different control surface entirely (CLI, file, API).
Does the server give the LLM the full tree or a summary?
Neither, exactly. Every tool call writes the full tree to /tmp/macos-use/<timestamp>_<tool>.txt and returns a compact text summary to the model: a status line, the PID and app name, the file path, the file size and element count, a grep hint, the screenshot path, and a small sample of visible elements. The model uses Read on the .txt with offset/limit or Grep on it to find specific roles or text. This pattern exists because a naive full-tree return on a typical Slack window is 8000+ elements and 200-800 KB of JSON, which every model on the market gets lost on inside a tool loop. The summary plus file pattern keeps token cost flat per action.
What is the smallest concrete example of testing discovery yourself?
Install the server with `claude mcp add macos-use -- npx -y mcp-server-macos-use`. Restart Claude Code. Grant your terminal Accessibility and Screen Recording. Then in a Claude Code session say: 'open Calculator and dump the AX tree.' The agent calls macos-use_open_application_and_traverse({identifier: 'Calculator'}). The response includes a `file:` field with the .txt path and a `screenshot:` field with the .png. Open the .txt yourself with `bat` or `less`; every line is `[Role] "text" x:N y:N w:W h:H visible`. Grep for AXButton: you see the 0-9, +, -, *, /, =, AC, +/-, %, . keys with their coordinates. That is discovery. Asking the agent to click the 7 button next exercises the actuation half.
Keep going on the macOS MCP stack
macOS MCP server: the full 9-tool inventory
Every tool registered at main.swift:1482, what it dispatches to, and the one chaining trick (element=, text=, pressKey=) that collapses three calls into one.
Accessibility tree, native macOS apps: two practical reads in 2026
AXUIElement walk vs. NSAccessibility deep copy. What each one exposes, what each one misses, and why macos-use uses the first.
macOS accessibility automation: the return-shape problem
Discovery is solved. The harder half in 2026 is what you return to an LLM after each action, and why macos-use ships a 111-line screenshot subprocess to keep ReplayKit out of the main process.
Drive native macOS apps via the AX tree with MCP
Why CGEvent click + AXUIElement read is the right primitive for an LLM agent, and how the agent picks which actuator to try next.
Comments (••)
Leave a comment to see what others are saying.Public and anonymous. No signup.