MCP server for macOS automation274 stars on GitHubShipping inside Fazm

Drive any Mac app from Claude Code

A Swift MCP server that hands your AI assistant the same accessibility tree Apple gives VoiceOver. Click any button by text. Type into any field. Drive Xcode, Slack, Mail, System Settings, anything with an AX tree.

  • Not an AppleScript wrapper. Native AX APIs + CGEvent, so it works on apps with no scripting dictionary (most modern Mac apps).
  • Not a screenshot agent. Structured tree responses, diff-only after each action. No OCR tax, no vision-model bill.
  • 100% local. One Swift binary over stdio. No SaaS, no network egress from the server.
View on GitHub

Works with Claude Code · Claude Desktop · Cursor · VS Code · Windsurf · Cline · Zed

claude — macos-use (live recording)

Real session: Claude Code calling macos-use to open an app, read the accessibility tree, click by text, and verify the result. No edits, no AppleScript, no screenshot loops.

Setup

Three steps to automation

01

Install

One command in Claude Code, or paste the JSON config into your MCP client.

02

Approve

Grant Accessibility permission to your MCP host on first run.

03

Ask Claude

Tell Claude to open apps, click buttons, fill forms, anything with an AX tree.

Installation

Drop your email. Pick your client. Done.

We'll email you the one-line install for Claude Code plus the JSON config for Cursor, Claude Desktop, VS Code, and Windsurf. macOS 13+, Swift builds on install.

First run prompts for Accessibility permission on whichever app is running your MCP client.

In practice

Once it's installed, type things like this

Real prompts you can drop into Claude Code or Cursor today. The server resolves each one through the accessibility tree, so the agent clicks the right button instead of guessing pixels.

> Open Xcode, run my project, and screenshot the first build error.

Launches Xcode, presses ⌘R, watches the issue navigator, returns the error region.

> In Slack, find my latest DM with Sarah and reply "on it".

Focuses Slack, opens the DM by accessibility label, types into the message field, sends.

> Open System Settings → Privacy & Security and list every app with Accessibility access.

Walks the settings tree, reads the toggles row by row, returns a structured list.

> Drive Cursor: open src/app/page.tsx, jump to line 42, open the agent panel.

Cross-app handoff: focuses Cursor, ⌘P to open, ⌘G to jump, ⌘L for the panel.

> Find the screenshot I just took on the Desktop, open it in Preview, copy the OCR text.

Uses Finder + Preview through the accessibility tree, no AppleScript glue.

> Open Mail, find the unread invoice from Stripe, download the PDF to ~/Invoices.

Mail traversal + native CGEvent clicks. Same flow that ships inside Fazm.

Every call is gated by your MCP client's approval prompt. You see the action before the server runs it.

Tools

Six tools. Full control.

Every tool returns the updated accessibility tree as a diff, so the agent always knows what changed.

open_application_and_traverse

Launch or focus any app by name, bundle ID, or path.

click_and_traverse

Click at coordinates or by element text. Optionally type and press a key in one call.

type_and_traverse

Type into the focused field, with optional modifier keystroke.

press_key_and_traverse

Arrow keys, ⌘⇧4, anything. Full modifier support.

scroll_and_traverse

Scroll lines in any direction at a given position.

refresh_traversal

Re-read the accessibility tree without taking an action.

Why macos-use

Why native accessibility beats screenshots

Screenshot agents burn tokens re-describing the screen every step and guess pixel positions. macos-use hands Claude a structured tree with semantic roles and coordinates, then returns only what changed after each action.

Accessibility tree, not pixels

Every action returns structured elements with role, text, and coordinates: `[AXButton] "Open" x:680 y:520 w:80 h:30 visible`. No OCR, no vision model tax.

Click by text

element: "Submit" finds and clicks. No pixel guessing.

Diff responses

After each action, only changed elements come back. Cheaper tokens, faster loops.

Native event injection

CGEvent clicks and keystrokes are OS-level. Works with apps that reject other simulated input.

InputGuard + Escape

User input blocked during automation so you can't fight the agent. Escape cancels, 30s watchdog prevents lockout.

Cross-app handoff

Click a link that opens Safari? The server detects the new frontmost app and traverses it automatically.

Response shape

What Claude actually receives

Every tool returns a compact summary plus a path to the full accessibility tree dump. Claude greps the file for the element it wants. No screenshots in the prompt, no OCR pass, no pixel guessing.

open_application_and_traverse("Slack") → summary
pid: 4218
app: Slack
elements: 412 total, 87 visible
file: /tmp/macos-use/slack-traversal.txt
screenshot: /tmp/macos-use/slack.png
processing_time_seconds: "0.31"
slack-traversal.txt (excerpt)
[AXButton] "Direct messages" x:14 y:198 w:236 h:32 visible
[AXRow] "Sarah Chen" x:14 y:286 w:236 h:36 visible
[AXTextArea] "Message Sarah" x:268 y:812 w:892 h:42 visible
[AXButton] "Send" x:1188 y:818 w:32 h:30 visible
[AXStaticText] "on it" x:268 y:812 w:54 h:18 visible
click_and_traverse(element: "Send") → diff (only what changed, not the whole tree)
{
  "added": [
    { "role": "AXStaticText", "text": "on it", "x": 268, "y": 760, "in_viewport": true },
    { "role": "AXStaticText", "text": "Just now",  "x": 1108, "y": 760, "in_viewport": true }
  ],
  "removed": [
    { "role": "AXStaticText", "text": "on it", "x": 268, "y": 812, "in_viewport": true }
  ],
  "modified": [
    {
      "before": { "role": "AXTextArea", "text": "on it" },
      "after":  { "role": "AXTextArea", "text": "" },
      "changes": [{ "attributeName": "AXValue", "oldValue": "on it", "newValue": "" }]
    }
  ]
}

Click sends in ~300ms and returns five fields, not a screenshot. The agent sees the message left the input and landed in the thread, then moves on.

macos-use vs. AppleScript-based MCP servers

If you've tried steipete/macos-automator-mcp, peakmojo/applescript-mcp, or any other osascript wrapper, here's what changes.

FeatureAppleScript MCPsmacos-use
What the AI getsFree-form text from `osascript`. The agent has to know the right script for every app.Live accessibility tree with roles, labels, and coordinates. Same data Apple gives VoiceOver.
App coverageOnly apps that ship a real AppleScript dictionary. Most modern apps don't.Every app macOS can describe via AX, including Electron apps, browsers, settings panels.
How clicks happenAppleScript `click button` calls, often blocked by sandboxing. Many apps just refuse.Native CGEvent at the OS level. Indistinguishable from a real user, works everywhere.
Failure modeCryptic AppleScript errors, the agent retries blind.Diff response shows exactly which element changed. Agent can self-correct.
Auth & runtimeAppleScript runs in the host app's permission scope, hard to reason about.One Swift binary over stdio. Local, open source, pinnable npm version.

AppleScript still wins for a handful of legacy automation tasks (Finder folder actions, Mail rules). macos-use stays out of those lanes; everything else, the AX tree is just a better data source.

macos-use vs. screenshot-based agents

FeatureScreenshot agentsmacos-use
How it sees the UIScreenshot + OCR / vision modelAccessibility tree with roles and coordinates
Token cost per actionFull screen re-described every stepDiff-only: elements added / removed / changed
Click targetingPixel guess from screenshotExact coords from tree, or element text match
Input injectionSimulated keystrokes via vision loopCGEvent, indistinguishable from real user input
SetupElectron/Docker/Python stackOne Swift binary + stdio MCP
Where it runsOften hosted SaaS100% local on your Mac

Screenshots still matter for apps that expose no accessibility tree. macos-use captures windows on demand so you can combine both when you need to.

Battle tested in production

Powers Fazm

The same server ships inside Fazm as the screen-control layer for a real, paying-customer product. If it works there, it works for your side project.

Open source, MIT

Every line is on GitHub. Pin a version, fork it, audit the Swift. Local binary over stdio, no network calls from the server itself.

Questions developers ask before installing

Which MCP clients does it work with?

Anything that speaks MCP over stdio. Tested daily with Claude Code, Claude Desktop, Cursor, VS Code (Copilot Chat), Windsurf, Cline, and Zed. Same JSON config, different file path per client.

How is this different from screenshot-based macOS agents?

It reads Apple's native accessibility tree (AXUIElement), so the AI gets structured elements with roles, labels, and coordinates instead of pixels. No OCR, no vision-model tax, no guessing pixel positions. Click by text match (element: "Submit") or exact coordinates from the tree. Responses are diff-only, so after an action you get what changed in the UI, not the whole screen again.

What macOS permissions does it need, and who grants them?

Accessibility permission is granted to the host process (Claude Desktop, Terminal, iTerm, VS Code, whoever spawns the MCP server), not to macos-use itself. That's macOS's TCC model. Screen Recording is needed only if you want window screenshots. Both are revocable from System Settings > Privacy & Security.

Will it click things I didn't approve?

No. Every tool call is gated by your MCP client's approval UI (Claude Code shows a diff-style prompt before each call). During automation, an InputGuard overlay blocks stray keyboard and mouse input so you don't fight the agent. Escape cancels the current action immediately. A 30-second watchdog prevents permanent lockout.

Is this safe to install? Where does the code run?

Fully local. The MCP server is a Swift binary running on your Mac, communicating with your AI client over stdio. No network egress from the server itself. Source is open on GitHub under mediar-ai/mcp-server-macos-use. Pin a specific npm version if you want reproducible installs.

What can't it do yet?

Apps that expose no accessibility tree (some Electron and custom-rendered games) fall back to coordinate-only clicks. Drag gestures across windows are basic. There is no recording/replay API yet. If you hit something missing, open an issue or book a call, the roadmap follows actual users.

How do I uninstall?

Remove the entry from your MCP config file and (optionally) npm uninstall -g mcp-server-macos-use. Revoke Accessibility/Screen Recording in System Settings > Privacy & Security > Accessibility by removing the host app.

Ready to try it?

Install with one command. If you're building something bigger on top of it and want the Swift, accessibility, or MCP side tailored to your use case, book 20 minutes with the team.

Book a 20-min call

Keep reading