Accessibility tree, native macOS apps: the two practical ways to read one in 2026

M
Matthew Diakonov
6 min

Direct answer · verified 2026-05-12

Two paths. Accessibility Inspector (Apple's built-in tool, ships with Xcode) is the one-off GUI: open Xcode, then Xcode → Open Developer Tool → Accessibility Inspector, point at any running app's window, browse the tree. macos-use is the agent / scripting path: install the MCP server, call any of its tools, and read the entire tree from /tmp/macos-use/<timestamp>_<tool>.txt, one element per line, in a format you can grep. Source for the format: Sources/MCPServer/main.swift:993-1010.

Source for path 2: github.com/mediar-ai/mcp-server-macos-use.

The tree is per-process, not per-window

Before either tool, the model in your head matters. The accessibility tree of a native macOS app is rooted at the process. AXUIElementCreateApplication(pid) returns the root, and from there every window the app owns lives in the same tree: the main window, the Preferences sheet, the Sparkle updater dialog, an open File → Save panel. macos-use uses that exact entry point in three places (Sources/MCPServer/main.swift:244, 309, 349) and treats "visible" as visible inside any window of the app, which is why the dump can contain elements that look out-of-frame in the screenshot but are still actionable.

One running Mac with ten apps has ten parallel trees, none of them aware of each other. The cross-app handoff that happens when you click a link in Slack and Mail.app comes forward is detected by the server only because it polls NSWorkspace.shared.frontmostApplication after the action and re-traverses the new frontmost app from its own root.

Path 1: Accessibility Inspector

Apple ships an inspector with Xcode. Install Xcode, open it, then go Xcode → Open Developer Tool → Accessibility Inspector. Pick a target process from the dropdown at the top-left. The left pane shows the tree as a collapsible outline. Hover the small crosshair button in the toolbar over any window to highlight the element under the cursor and jump to it in the tree. The right pane shows attributes: role, title, value, position, size, identifier, available actions, sometimes a help string.

What it's good for: looking at a real app once, sanity-checking what your own SwiftUI or AppKit app exposes, finding the right AXIdentifier for a UI test. What it's not good for: a script or an AI agent that needs the same tree shape twice in a row, or a diff across actions, or a single file you can hand to an LLM as context.

Path 2: dump the tree to a file with macos-use

macos-use is an MCP server (open source, MIT, mediar-ai/mcp-server-macos-use) that exposes nine tools to any MCP-compatible client (Claude Code, Cursor, Claude Desktop, Cline, VS Code, Windsurf, Zed). Two of them just dump the tree: open_application_and_traverse and refresh_traversal. The other seven also act on the tree (click, type, press, scroll, AX-level set_value / press_ax / set_selected) and write a diff after the action.

claude mcp add macos-use -- npx -y mcp-server-macos-use
Requires Claude Code (npm i -g @anthropic-ai/claude-code) and macOS 13+. Swift builds on first run, ~20 seconds.

Once installed, every tool call writes two files: /tmp/macos-use/<ts>_<tool>.txt with the element list, and a sibling .png screenshot of the captured window. The text file is the tree dump. Format:

/tmp/macos-use/1715491200000_open_application_and_traverse.txt
# Mail — 1842 elements (0.41s)

[AXApplication] "Mail" x:0 y:24 w:1440 h:900 visible
[AXWindow] "Inbox — All Inboxes" x:0 y:24 w:1440 h:900 visible
  [AXSplitGroup] x:0 y:62 w:1440 h:862 visible
    [AXScrollArea] "Mailbox List" x:0 y:62 w:240 h:862 visible
      [AXOutline] x:0 y:62 w:240 h:862 visible
        [AXRow] x:8 y:74 w:224 h:24 visible
          [AXStaticText] "Inbox" x:36 y:78 w:60 h:16 visible
        [AXRow] x:8 y:102 w:224 h:24 visible
          [AXStaticText] "Drafts" x:36 y:106 w:48 h:16 visible
    [AXScrollArea] "Message List" x:240 y:62 w:380 h:862 visible
      [AXTable] x:240 y:62 w:380 h:862 visible
        [AXRow] x:248 y:78 w:364 h:64 visible
          [AXStaticText] "Apple ID Receipt" x:264 y:88 w:240 h:16 visible
    [AXScrollArea] "Reading Pane" x:620 y:62 w:820 h:862 visible
      [AXButton (button)] "Reply" x:680 y:84 w:60 h:28 visible
      [AXButton (button)] "Reply All" x:748 y:84 w:80 h:28 visible
      [AXButton (button)] "Forward" x:836 y:84 w:80 h:28 visible
      [AXTextArea] "Message body" x:640 y:140 w:780 h:720 visible

One element per line. Role in brackets, optional text in quotes (truncated at 80 chars, see main.swift:997), top-left and width/height as integers, and the literal token visible if the element falls inside the union of the app's window bounds. The whole format is 18 lines of Swift at main.swift:993-1010 and is built deliberately to be grep-friendly:

# every clickable target in the tree
$ grep -n 'AXButton' /tmp/macos-use/1715491200000_open_application_and_traverse.txt

# only what's visible right now
$ grep -n 'visible' /tmp/macos-use/1715491200000_open_application_and_traverse.txt | head -40

# any text matching "Send"
$ grep -in 'send' /tmp/macos-use/1715491200000_open_application_and_traverse.txt

That's the entire mental model. The text file is the tree. You hand the path to a model (or to your own script), it greps for what it needs, it gets coordinates back, it uses those coordinates in the next call. No JSON parsing, no walker, no tree-shaped data structure to keep in memory.

Side-by-side

FeatureAccessibility Inspectormacos-use (MCP)
Output formatInteractive GUI panesFlat .txt + .png on disk
Repeatable across runsManual point-and-clickSame tool call, same shape
Pipe to grep / awk / a modelNoYes (one element per line)
Diff after an actionEyeball the panesBuilt in, +/-/~ entries
Drives the app (click, type)Read-onlyYes, same tool can act and re-read
Setup costInstall Xcode (~15 GB)One claude mcp add command
Permission scopeAccessibility (Xcode)Accessibility + Screen Recording (host app)

Four flavors of "native" you'll actually encounter

The accessibility tree of a so-called native macOS app is whatever its UI toolkit chose to publish. Same .app bundle layout, same Dock icon, very different trees. This matters because everything below (synthetic clicks, attribute writes, AX actions) behaves differently depending on which one you're reading.

AppKit

Mail, Finder, Notes, TextEdit. Clean per-control tree: AXButton, AXTextField, AXMenuItem with proper press actions. Synthetic CGEvent clicks land. Good baseline for tree-walking and for input synthesis.

SwiftUI

More verbose: every modifier can wrap the element in another AXGroup. Tree shape is right, action paths usually work. Identifier discipline depends on whether the developer set .accessibilityIdentifier(...).

Mac Catalyst

UIKit-on-Mac. Messages, Maps, News. The tree looks correct, but synthetic clicks on right-pane controls are silently dropped. Calling kAXPressAction or kAXValueAttribute on the element is often the only path that actuates.

Electron

Slack, Discord, VS Code, Notion. Tree comes from a Chromium accessibility shim: thousands of AXGroup and AXStaticText nodes wrapping an AXWebArea. Readable, but most buttons don't expose AXPressAction; you click by coordinates.

Practical consequence: a script that walks the tree and clicks AXButton[text="Send"] works on Mail (AppKit), works on most SwiftUI apps, may need an AX-level fallback on Messages (Catalyst), and will need to fall back to coordinate clicks on Slack (Electron). The flat-text dump is the same shape in every case; it's the actuation that changes.

What most guides describe vs what actually shows up

An accessibility tree is a small clean hierarchy. AXApplication -> AXWindow -> a handful of AXButton / AXTextField / AXMenuItem nodes named exactly the way the user sees them. You walk it, find your element by title, perform an AX action, done.

  • Implies trees are small (dozens of nodes)
  • Implies role names match the visual UI 1:1
  • Implies every clickable thing has an AX action

That mismatch is why the macos-use server filters scroll-bar churn (isScrollBarNoise at main.swift:591-597), structural-only nodes (isStructuralNoise at main.swift:599-607), and coordinate-only attribute changes (main.swift:681-682) before it writes the dump. You can turn this off and walk the raw tree if you want the unfiltered version, but the default is the one that's actually useful for an agent loop.

What lives in a single tree node

Each node in the dump is one line, but the underlying AXUIElement on disk has more fields than the line surfaces. The line shows what an actor needs; the rest is a query away.

  • AXRole — the kind: AXButton, AXTextField, AXScrollArea, AXWindow, AXWebArea, AXSheet, etc. Sometimes paired with an AXSubrole that disambiguates (AXButton with AXSubrole AXCloseButton is the red traffic-light dot).
  • AXTitle / AXValue — the human-readable text. macos-use picks AXValue first, then AXTitle (main.swift:1106-1118), so a text field shows the entered text, not the placeholder label.
  • AXPosition / AXSize — CGPoint and CGSize in screen coordinates. These are what you pass back to click_and_traverse if you want to click the element.
  • AXChildren — ordered list of child AXUIElements. The depth-first walk happens here.
  • AXEnabled / AXSelected / AXFocused — flags. Only show up in the dump line as part of a diff (e.g. ~ [AXButton] "Send" | AXEnabled: 'false' -> 'true').
  • AXIdentifier — a stable ID the developer opted to set. When present, it's the safest selector across runs.
  • Available actions — a list of strings like AXPress, AXShowMenu, AXIncrement. Catalyst rows often expose none of these and require a write to kAXSelectedAttribute instead.

Hooking macos-use into a non-trivial agent loop?

If you're driving a real workflow (Catalyst app, Electron app, multi-window flows), it's worth a 20-minute call to compare notes on what's worked and what hasn't.

Frequently asked

Frequently asked questions

How do I read the accessibility tree of a native macOS app?

Two practical paths. (1) Accessibility Inspector. Open Xcode, then Xcode > Open Developer Tool > Accessibility Inspector. Point the inspector at any running app's window. The left pane shows the tree, the right pane shows attributes (AXRole, AXTitle, AXValue, AXPosition, AXSize, available actions). (2) The macos-use MCP server. Install it once, then any tool call (open_application_and_traverse, refresh_traversal) writes the entire tree to /tmp/macos-use/<timestamp>_<tool>.txt in a flat one-element-per-line format. You read it with cat, grep, or any text tool. Path 1 is for human inspection; path 2 is for any code or AI agent that needs to consume the tree.

What does a single line in the macos-use dump actually look like?

An element line is `[Role] "text" x:N y:N w:W h:H visible`. For example: `[AXButton (button)] "Send" x:820 y:612 w:60 h:28 visible`. Role first, then the element's text in quotes (truncated to 80 chars at Sources/MCPServer/main.swift:997), then top-left coordinates and width/height as integers, then the literal token `visible` if the element falls inside the current window bounds. The format is produced by formatElementLine at main.swift:993-1010 and the file is written by buildFlatTextResponse at main.swift:1013-1058.

Why is the tree per-process, not per-window?

Because the AX root for an application is always AXUIElementCreateApplication(pid). Every tree the OS exposes is rooted at a process. main.swift uses that call at three sites (lines 244, 309, 349) to get the application root before walking. One running Mac with ten apps has ten parallel AX trees. A single app's tree contains all of its windows in one tree, including hidden ones (Preferences, Sparkle update dialogs, secondary windows). The macos-use server checks 'visible' against the union of all the app's window bounds (main.swift:621-629), which is why the `visible` token in the dump can mark elements outside what looks like the foreground window.

Why are there four flavors of 'native' I should care about?

Because the tree's quality depends on which UI toolkit the app was built with, even if the app menu icon and the .app bundle look the same. AppKit apps (Mail, Finder, Notes-the-old-one) emit a clean tree with AXButton, AXTextField, AXMenuItem and proper press actions. SwiftUI apps emit a similar tree, often more verbose because every modifier becomes a wrapper element. Mac Catalyst apps (UIKit-on-Mac, e.g. Messages, Maps) emit a tree that looks correct but where many nodes silently ignore synthetic CGEvent clicks; you have to call kAXPressAction or kAXValueAttribute on the element directly. Electron apps (Slack, Discord, VS Code) emit thousands of AXGroup and AXStaticText nodes from a Chromium accessibility shim, which an agent can read but which is far less actionable than an AppKit tree.

Can I read the tree from a running app without modifying or recompiling it?

Yes, that's the whole point of the Accessibility API. The host you're reading from (Xcode, Terminal, Claude Code, Cursor, the macos-use MCP host) needs to be granted Accessibility permission in System Settings > Privacy & Security > Accessibility. Granting that, AXUIElementCreateApplication(pid) succeeds for any app the OS can see, with no cooperation from the target. On Sequoia and later, capturing a screenshot of the same window also requires Screen Recording permission for the host. macos-use needs both because every tool call writes a /tmp/macos-use/<ts>_<tool>.txt and a /tmp/macos-use/<ts>_<tool>.png next to each other.

What's the practical difference between Accessibility Inspector and the macos-use dump?

Inspector is a GUI for a human. You hover over a window, you see the element under your cursor, you read its attributes in the right pane, you click around the tree. It's the right tool when you're auditing an app you're building or trying to understand why a control isn't reachable. The macos-use dump is a text file. You can pipe it through grep to find every AXButton, you can diff two dumps to see what changed after an action, you can hand the file to an LLM as context. It's the right tool when you want to drive an app from a script or an AI agent. They are not in competition; they answer different questions.

Is the tree stable across runs of the same app?

Mostly. Identifiers (AXIdentifier when set), roles, and the rough shape of the tree stay stable. Coordinates change with window position and size. AXValue changes with state. Some toolkits insert ephemeral tooltip and focus-ring elements that come and go between traversals. macos-use handles this by treating coordinate-only changes as noise (filtered at main.swift:681-682) and by dropping a defined set of structural-only and scroll-bar nodes (isScrollBarNoise at main.swift:591-597, isStructuralNoise at main.swift:599-607) so a diff between two traversals only highlights changes that an agent can act on.

Does the tree contain text inside web views?

Sometimes, partly. Safari and any app embedding WKWebView surface DOM accessibility through the AX bridge as AXWebArea with nested AXLink, AXButton, AXStaticText, etc. The tree you get is roughly what VoiceOver would announce, not the full DOM. Heading levels, link text, form labels are all present; CSS-styling-only changes are not. Electron apps (Chromium-based) follow the same model. A native AppKit app that does not use a web view has a much richer per-control tree than the same screen rendered in a web view inside the same app.

What's the smallest end-to-end example of dumping a tree from Claude Code?

Install the server with `claude mcp add macos-use -- npx -y mcp-server-macos-use`. Restart the Claude Code session so the new MCP is registered. Grant Accessibility permission to your terminal app on the first call. Then ask Claude: 'open Mail and dump the AX tree'. The agent will call macos-use_open_application_and_traverse({identifier: 'Mail'}). The response includes a `file:` field with the .txt path under /tmp/macos-use/ and a `screenshot:` field with the .png. You can `cat`, `grep`, or open the .txt yourself; you can `open` the .png to see the captured window.

What about apps that opt out of accessibility, like games or DRM video?

Many full-screen game engines (Metal-only renderers without AppKit views) expose only a top-level AXWindow with no children worth speaking of. DRM-protected video players the same. CGWindowListCreateImage on Sequoia returns blank or menu-bar-only frames if Screen Recording isn't granted. None of this is something a tree-walker can fix; the app simply hasn't published anything to walk. macos-use will return a small tree and a screenshot. The agent's plan needs to fall back to coordinate-driven clicking against the screenshot for those cases, or to a different control surface entirely.

How long does it take to dump the tree of a real app?

Order-of-magnitude: tens to hundreds of milliseconds for AppKit, hundreds of milliseconds to a couple of seconds for SwiftUI-heavy or Electron apps with large element counts. The traversal time is included in the file header line of the dump (`# <app> — N elements (Xs)`, written at main.swift:1018). The dominant cost is AXUIElementCopyAttributeValues per node times depth-first traversal; that's why the server caches the full tree on disk and returns only the diff after disruptive actions like click/type/press, not the whole tree again.

Related reading

macos-useMCP server for native macOS control
© 2026 macos-use. All rights reserved.

How did this page land for you?

React to reveal totals

Comments ()

Leave a comment to see what others are saying.

Public and anonymous. No signup.