accessibility tree, agent editionfallback ladderkAXPressAction · kAXValueAttribute · kAXSelectedAttribute

macOS Accessibility Tree Agent: The Three-Rung Fallback Ladder For When The Click Silently Drops

An accessibility-tree-driven agent on macOS hits a wall the moment it touches a Mac Catalyst app, a sandboxed app, or a secure-input field. The CGEvent click reaches the OS, the OS hands it to the app, and the app drops it. The diff comes back empty. mcp-server-macos-use ships three escalation tools for exactly this case, declared at main.swift:1441, 1458, and 1476. Each one bypasses the input event tap and fires the action directly on the AX element.

M
Matthew Diakonov
11 min read
5.0from open source
Three escalation tools declared at main.swift:1441, 1458, 1476
All three set showDiff=true at main.swift:1745, 1764, 1784
Source explicitly names Catalyst right-panes and kAXErrorActionUnsupported

The Click That Doesn't Click

Most articles about driving macOS via the accessibility tree stop at the same place. Read the tree, find an AXButton, post a CGEvent at the centered point, done. That works on most native AppKit and SwiftUI apps. It does not work on Mac Catalyst right panes. It does not work on a number of sandboxed apps. It does not work when a secure-input field has focus. And it does not work on Catalyst rows, where the row exposes AXSelected but no AXPress and the click would have nothing to actuate even if it did land.

The failure mode is the worst kind: the click reaches the OS, the OS routes it through the HID event tap, the foreground window server queue accepts it, and the target app drops it on the floor. No exception, no error code, no callback. The traversal-after looks identical to the traversal-before. The agent is left with an empty diff and no signal about what to do next.

mcp-server-macos-use solves this with three additional tools. All three skip the input event tap entirely and operate on the AX element directly. The agent escalates: try the click, see the empty diff, call the next rung. By the time the third rung fails, the agent knows the action is not actuatable, not that the path was wrong.

The Inventory, In Numbers

0MCP tools total (main.swift:1482)
0fallback rungs
0happy-path tools
0rungs require a re-traversal of unrelated apps

let allTools at main.swift:1482 declares all nine. Three of them only exist for the failure case.

How Much Of The Server Exists Just For This Failure Mode

0
escalation tools
0
MCP tools total
0%
of surface area is fallback
0
CGEvent posts in the fallback path

Where The Click Goes When It Doesn't Land

Inputs on the left are the happy-path tools that post CGEvents. Outputs on the right are the three escalation rungs. The hub is where the agent decides which path to take based on the diff from the previous call.

Happy path on the left, fallback ladder on the right

click_and_traverse
type_and_traverse
empty diff = escalate
press_ax_and_traverse
set_value_and_traverse
set_selected_and_traverse

What The Agent Sees: Empty Diff, Then The Real Diff

Click reaches the OS, OS hands it to a Catalyst right-pane button, button drops it. The first call returns a 64-byte file with no diff entries. The agent reads the empty diff, falls back to press_ax_and_traverse, and gets the actual state change.

empty click diff -> press_ax escalation

With Versus Without The Ladder

Same Catalyst app, same Save button, same agent. The only difference is whether the agent escalates after an empty diff.

Catalyst right-pane Save button, two agent loops

Agent reads the AX tree, sees AXButton 'Save' at (612, 408), calls click_and_traverse. The diff comes back empty: zero added, zero removed, zero modified. The agent has no way to know if the click was just slow, the button was disabled, or the app dropped the synthetic event. Most agents loop the same call, hit the same empty diff, then bail out and ask the user.

  • click_and_traverse returns empty diff
  • Agent does not know if click landed
  • Loops the same call, gets the same empty diff
  • Bails out and asks the user

The Three Rungs, Source-Level

Each tool is declared once. The description string is the contract: it tells the model exactly when to reach for that tool. Read these three back-to-back and the failure-mode map is self-documenting.

Sources/MCPServer/main.swift
Sources/MCPServer/main.swift
Sources/MCPServer/main.swift

The Dispatch That Picks The AX Action Variant

The handler maps each tool name to a different variant of .input(action: ...). set_value goes to .axSetValue, press_ax goes to .axPress, set_selected goes to .axSetSelected. All three flip showDiff so the agent gets the same +/-/~ output it gets from a regular click.

Sources/MCPServer/main.swift

The Order The Agent Walks

Try the cheapest path first. Each rung leaves behind a timestamped .txt + .png pair so the model can audit which attempts ran and which one finally actuated the UI.

  1. 1. click_and_traverse

    CGEvent posted at the auto-centered point. Works for ~95 percent of native AppKit and SwiftUI apps. Use first.

  2. 2

    2. press_ax_and_traverse

    Bypasses the event tap. Calls kAXPressAction on the AX element. Catalyst right-pane buttons, sandboxed apps.

  3. 3

    3a. set_value_and_traverse

    kAXValueAttribute. For text fields where typing failed: Catalyst right-pane fields, secure-input contexts.

  4. 4

    3b. set_selected_and_traverse

    kAXSelectedAttribute. Catalyst rows, sidebar entries, outlines that expose AXSelected without AXPress.

What Each Rung Is Designed For

Mac Catalyst right pane

UIKit-rendered fields and buttons participate in the AX tree but drop CGEvent posts at their coordinates. set_value (fields) and press_ax (buttons) land. Cited at main.swift:1442 and 1459.

Sandboxed apps

Sandbox entitlements opt the app out of receiving synthetic events. AX actions still go through because they cross the accessibility boundary, not the input event tap.

Secure-input contexts

When a secure field has focus, synthetic keyboard events are blocked system-wide. set_value writes via kAXValueAttribute and is not subject to the secure-input mode.

AXSelected-only rows

Catalyst tables, sidebar lists, outline rows. press_ax errors with kAXErrorActionUnsupported. set_selected toggles kAXSelectedAttribute and the row activates. main.swift:1477.

kAXErrorActionUnsupported, Then The Recovery

Catalyst sidebar rows are the canonical example. The row participates in the AX tree, exposes AXSelected, and ignores both the synthetic click and kAXPressAction. The error code kAXErrorActionUnsupported from press_ax is the agent's cue to flip to set_selected_and_traverse.

press_ax errors -> set_selected lands
main.swift:1477

Right primitive for Catalyst table rows, sidebar list entries, outline rows, and other selection-bearing controls that expose AXSelected but no AXPress action (where regular click is dropped and press_ax errors with kAXErrorActionUnsupported).

set_selected_and_traverse description, verbatim

The Two Wire Diagrams That Explain The Whole Thing

First call: the click goes through the input event tap, the target app drops it, the diff is empty. Second call: the AX action goes around the tap and lands on the element directly. The agent sees both.

click drops, press_ax lands

MCP Clientmacos-useInput Event TapAX LayerTarget Appclick_and_traverse {element:"Save"}post CGEvent at (644,422)synthetic click(no state change)diff: 0 added, 0 removed, 0 modifiedpress_ax_and_traverse {x,y,w,h}AXUIElementPerformAction(kAXPressAction)AX action delivered to elementAXSheet opens, button disablesdiff: 4 added, 1 modified

The Aggregated Tool List

Six tools cover the happy path. Three exist purely for the apps that don't cooperate with synthetic events. Reading theallToolsline as a literal answers the question of how much of this server is dedicated to the failure mode.

Sources/MCPServer/main.swift

CGEvent Path vs. AX API Path, Side By Side

The two paths are not redundant. Each one wins where the other loses. CGEvent works on webviews, Electron, games, and anything that doesn't implement AX actions. AX API actions work on Catalyst, sandboxed apps, and selection-only rows. The agent doesn't pick a side; it picks based on what the previous rung returned.

DimensionCGEvent path (click/type/press/scroll)AX API path (set_value/press_ax/set_selected)
Path through the OSPosts CGEvent. OS routes through the HID event tap and on to the foreground app's window server queue.Calls AXUIElementPerformAction or AXUIElementSetAttributeValue. Delivered through the accessibility API, never touches HID.
Catalyst right paneClick drops silently. Empty diff. Agent has no recourse.Action lands on the UIKit-bridged element via kAXPressAction or kAXValueAttribute.
Secure-input fieldtype_and_traverse blocked at the OS level. No keystrokes delivered.set_value_and_traverse writes via kAXValueAttribute. Not subject to secure-input filtering.
Selection-only rowClick hits the row visually; nothing changes because the row has no AXPressAction.set_selected_and_traverse flips kAXSelectedAttribute. Detail pane updates.
Diff contractclick_and_traverse returns +/-/~ diff (empty when click drops).All three fallback tools return the same +/-/~ diff (showDiff=true at main.swift:1745, 1764, 1784).
Coordinate handlingClick auto-centers at (x + w/2, y + h/2) from CGEvent position.Hit-tests at (x + w/2, y + h/2) when width/height are passed; passes through findAXElementAtPoint at main.swift:1078.

Why This Doesn't Show Up In Other Guides

Most writing about driving macOS through the accessibility tree treats the tree as the goal: read the tree, parse the tree, render the tree, embed the tree. The agent layer is left as an exercise. The interesting part of agent work isn't the read; it's the action loop. And the action loop on macOS has a specific cliff that the read-side guides never get to: the click that the OS accepts and the app silently throws away.

The recovery is not a generic retry(). It is three different AX actions, each backing off to a different attribute or perform-action call, each tuned to a specific failure mode named in the source. set_value for fields that ignore the keystroke. press_ax for buttons that ignore the click. set_selected for rows that do not implement press at all.

If your agent already drives macOS through the AX tree and you have ever shrugged and told a teammate "Catalyst is weird, the click just doesn't work," this is the shape of the answer.

Driving Catalyst, sandboxed, or secure-input apps from your agent?

Talk to the team about how the fallback ladder maps onto your specific app set, including which rung lands first.

Frequently asked questions

What does it mean when an agent's click silently drops on macOS, and how do I tell?

The agent reads the AX tree, finds an AXButton at known coordinates, posts a CGEvent mouse-down/mouse-up at the centered point, and the app does not respond. No exception is raised, no error is returned. The traversal-after looks identical to traversal-before. macos-use makes this easy to spot because every action returns a diff: when the diff is empty (zero added, zero removed, zero modified) on what should have been a state-changing click, the input event tap was bypassed by the target app. This happens predictably on Mac Catalyst right-pane controls, sandboxed apps that opt out of synthetic events, and any app where a secure-input field has captured the event tap.

What are the three fallback tools and what does each one do at the AX layer?

set_value_and_traverse writes a string into the element under (x,y) via kAXValueAttribute (declared at main.swift:1441). press_ax_and_traverse calls kAXPressAction on the element (main.swift:1458). set_selected_and_traverse sets kAXSelectedAttribute (main.swift:1476). All three skip CGEvent entirely. They get the AXUIElement at the hit-tested point and call AXUIElementSetAttributeValue or AXUIElementPerformAction directly. The input event tap is never engaged, so apps that drop synthetic events still receive the action because the action is happening on the accessibility object, not on the keyboard or mouse hardware path.

Why are Mac Catalyst right-pane fields specifically called out in the source?

Catalyst is UIKit-on-AppKit. The right pane of a Catalyst window often hosts UIKit-rendered text fields and buttons that participate in the AX tree (so the agent can see them) but ignore CGEvent posts targeted at their screen coordinates. The macos-use source names this case explicitly at main.swift:1442: 'Use when typing fails (Catalyst right-pane fields, sandboxed/secure-input contexts).' For those fields, the only thing that lands is AXUIElementSetAttributeValue with kAXValueAttribute. Same for Catalyst right-pane buttons at main.swift:1459: 'Use when a synthetic mouse click is dropped (Catalyst right-pane buttons, sandboxed apps). Often the only path that actuates buttons in those apps.'

When would I need set_selected instead of press_ax?

When the element exposes the AXSelected attribute but does not implement AXPressAction. The source comment at main.swift:1477 spells this out: 'where regular click is dropped and press_ax errors with kAXErrorActionUnsupported.' Common offenders are Catalyst table rows, sidebar list entries, and outline rows. Pressing them does nothing because there is no press action to perform; selecting them is the operation. set_selected_and_traverse calls AXUIElementSetAttributeValue with kAXSelectedAttribute and the boolean payload. The traversal-after typically shows the row's AXSelected flipping from false to true and the detail pane updating with the new content.

What does the escalation order look like in practice?

Try click_and_traverse first. It is a single call that posts a CGEvent and chains optional type/press, which is what works for ~95 percent of native AppKit and SwiftUI apps. If the diff comes back empty for a click that should have caused a visible state change, escalate to press_ax_and_traverse with the same x, y, width, height. press_ax errors with kAXErrorActionUnsupported on rows. When that error appears, escalate to set_selected_and_traverse for rows or set_value_and_traverse for text fields. The diff returned by each rung tells the agent immediately whether the rung worked: empty diff means try the next rung.

How does the server know which AX element corresponds to the (x, y) the agent passes in?

findAXElementAtPoint at main.swift:1078-1104 walks the AX tree depth-first from the application root. For each element it reads AXPosition and AXSize, builds a CGRect, and checks whether the point falls inside. It returns the deepest match. Fallback tools pass the centered point (x + width/2, y + height/2) when width and height are present, which is why the schema for set_value, press_ax, and set_selected accept optional width/height parameters at main.swift:1432-1435 and 1452-1455. The hit-tested element is the one the AX action is performed on, not whatever was visually under the cursor.

Does the agent get a diff back from these fallback tools, or just a status?

Same diff contract as click. All three escalation tools set options.showDiff = true at main.swift:1745, 1764, and 1784. That triggers a traverseBefore, executes the AX action, traverses again, subtracts, and returns the +/-/~ diff. So the agent sees exactly what changed: the AXValue of a text field flipping from empty to the new string, an AXButton's AXEnabled going from true to false, a row's AXSelected flipping. Same /tmp/macos-use/<ts>_<tool>.txt + .png pair, same flat format, same grep workflow.

Why not just always use kAXPressAction and skip CGEvent entirely?

Because not every AX element implements every action. AXButton usually does. AXMenuItem usually does. AXTextField and most controls in webviews, Electron apps, and games do not implement AXPressAction. CGEvent works against any window the OS knows about because it is firing real mouse and keyboard events the OS routes through the input event tap. The fallback ladder exists because neither path covers everything alone: CGEvent loses to Catalyst-and-sandboxed; AX actions lose to webviews-and-games. So the server exposes both, and the agent picks based on what the previous rung did.

What about secure-input fields, password boxes, the lock screen?

Secure input is a system-level mode where macOS blocks synthetic keyboard events from reaching the foreground app while a secure field has focus. Apps like Terminal opt into it; banking sites and 1Password trigger it. type_and_traverse will silently fail. set_value_and_traverse at main.swift:1441 is the recovery. It writes into the field via kAXValueAttribute, which is not blocked by secure input because no synthetic key event is fired. Whether the target app then accepts the written value depends on whether the field calls a controller that reads kAXValue, but the input is at least delivered. For a true password field on the macOS lock screen, neither path works, by design.

How does this fit alongside the click that does work, in one agent loop?

The agent calls click_and_traverse with element="Save" and reads the diff. If the diff has changes, the click landed and the agent moves on. If the diff is empty, the agent grabs the x, y, width, height for the same element from the prior traversal file and calls press_ax_and_traverse. If press_ax returns kAXErrorActionUnsupported, the agent flips to set_selected_and_traverse for a row or set_value_and_traverse for a field. Each rung writes its own /tmp/macos-use/<ts>_<tool>.txt with diff or error, so the model has a complete trace of which rungs it attempted and which one actuated the UI.

macos-useMCP server for native macOS control
© 2026 macos-use. All rights reserved.