MCP server for macOS automation326 stars on GitHubShipping inside Fazm

Drive any Mac app from Claude Code

A Swift MCP server that hands your AI assistant the same accessibility tree Apple gives VoiceOver. Click any button by text. Type into any field. Drive Xcode, Slack, Mail, System Settings, anything with an AX tree.

→Not an AppleScript wrapper. Native AX APIs + CGEvent, so it works on apps with no scripting dictionary (most modern Mac apps).
→Not a screenshot agent. Structured tree responses, diff-only after each action. No OCR tax, no vision-model bill.
→100% local. One Swift binary over stdio. No SaaS, no network egress from the server.
→You stay in control. Every action is gated by your MCP client's approval prompt. An InputGuard overlay blocks stray input during automation, and Escape cancels instantly.

View on GitHub

The setup pack adds per-client JSON for Cursor, Claude Desktop, VS Code, Windsurf, Cline & Zed, the six prompts below ready to paste, and a heads-up when new tools ship. One email, no list.

terminalClaude Code · one-liner

$ claude mcp add macos-use -- npx -y mcp-server-macos-use

Cursor, Claude Desktop, VS Code, Windsurf, Cline, Zed? Same package, JSON config below ↓

macOS 13+ · Swift builds on first run (~20s) · 326 ★ on GitHub

claude — macos-use (live recording)

Real session: Claude Code calling macos-use to open an app, read the accessibility tree, click by text, and verify the result. No edits, no AppleScript, no screenshot loops.

Install

Copy. Paste. Approve once.

The whole install is a single command for Claude Code or a five-line JSON config for everything else. macOS 13+. Swift builds on first run, about twenty seconds.

claude mcp add macos-use -- npx -y mcp-server-macos-use

Requires Claude Code (npm i -g @anthropic-ai/claude-code) and macOS 13+. Swift builds on first run, ~20 seconds.

Want the full setup pack by email?

Per-client config paths (Cursor, Claude Desktop, VS Code, Windsurf, Cline, Zed), the 30-second Accessibility-permission walkthrough, the six prompts above formatted for paste-in, and a heads-up when new tools ship. One email, no list.

First run prompts for Accessibility permission on whichever app is running your MCP client. Revoke anytime in System Settings → Privacy & Security.

In practice

Once it's installed, type things like this

Real prompts you can drop into Claude Code or Cursor today. The server resolves each one through the accessibility tree, so the agent clicks the right button instead of guessing pixels.

> Open Xcode, run my project, and screenshot the first build error.

Launches Xcode, presses ⌘R, watches the issue navigator, returns the error region.

> In Slack, find my latest DM with Sarah and reply "on it".

Focuses Slack, opens the DM by accessibility label, types into the message field, sends.

> Open System Settings → Privacy & Security and list every app with Accessibility access.

Walks the settings tree, reads the toggles row by row, returns a structured list.

> Drive Cursor: open src/app/page.tsx, jump to line 42, open the agent panel.

Cross-app handoff: focuses Cursor, ⌘P to open, ⌘G to jump, ⌘L for the panel.

> Find the screenshot I just took on the Desktop, open it in Preview, copy the OCR text.

Uses Finder + Preview through the accessibility tree, no AppleScript glue.

> Open Mail, find the unread invoice from Stripe, download the PDF to ~/Invoices.

Mail traversal + native CGEvent clicks. Same flow that ships inside Fazm.

Every call is gated by your MCP client's approval prompt. You see the action before the server runs it.

Tools

Six tools. Full control.

Every tool returns the updated accessibility tree as a diff, so the agent always knows what changed.

open_application_and_traverse

Launch or focus any app by name, bundle ID, or path.

click_and_traverse

Click at coordinates or by element text. Optionally type and press a key in one call.

type_and_traverse

Type into the focused field, with optional modifier keystroke.

press_key_and_traverse

Arrow keys, ⌘⇧4, anything. Full modifier support.

scroll_and_traverse

Scroll lines in any direction at a given position.

refresh_traversal

Re-read the accessibility tree without taking an action.

Why macos-use

Why native accessibility beats screenshots

Screenshot agents burn tokens re-describing the screen every step and guess pixel positions. macos-use hands Claude a structured tree with semantic roles and coordinates, then returns only what changed after each action.

Accessibility tree, not pixels

Every action returns structured elements with role, text, and coordinates: `[AXButton] "Open" x:680 y:520 w:80 h:30 visible`. No OCR, no vision model tax.

Click by text

element: "Submit" finds and clicks. No pixel guessing.

Diff responses

After each action, only changed elements come back. Cheaper tokens, faster loops.

Native event injection

CGEvent clicks and keystrokes are OS-level. Works with apps that reject other simulated input.

InputGuard + Escape

User input blocked during automation so you can't fight the agent. Escape cancels, 30s watchdog prevents lockout.

Cross-app handoff

Click a link that opens Safari? The server detects the new frontmost app and traverses it automatically.

Response shape

What Claude actually receives

Every tool returns a compact summary plus a path to the full accessibility tree dump. Claude greps the file for the element it wants. No screenshots in the prompt, no OCR pass, no pixel guessing.

open_application_and_traverse("Slack") → summary

pid: 4218
app: Slack
elements: 412 total, 87 visible
file: /tmp/macos-use/slack-traversal.txt
screenshot: /tmp/macos-use/slack.png
processing_time_seconds: "0.31"

slack-traversal.txt (excerpt)

[AXButton] "Direct messages" x:14 y:198 w:236 h:32 visible
[AXRow] "Sarah Chen" x:14 y:286 w:236 h:36 visible
[AXTextArea] "Message Sarah" x:268 y:812 w:892 h:42 visible
[AXButton] "Send" x:1188 y:818 w:32 h:30 visible
[AXStaticText] "on it" x:268 y:812 w:54 h:18 visible

click_and_traverse(element: "Send") → diff (only what changed, not the whole tree)

{
  "added": [
    { "role": "AXStaticText", "text": "on it", "x": 268, "y": 760, "in_viewport": true },
    { "role": "AXStaticText", "text": "Just now",  "x": 1108, "y": 760, "in_viewport": true }
  ],
  "removed": [
    { "role": "AXStaticText", "text": "on it", "x": 268, "y": 812, "in_viewport": true }
  ],
  "modified": [
    {
      "before": { "role": "AXTextArea", "text": "on it" },
      "after":  { "role": "AXTextArea", "text": "" },
      "changes": [{ "attributeName": "AXValue", "oldValue": "on it", "newValue": "" }]
    }
  ]
}

Click sends in ~300ms and returns five fields, not a screenshot. The agent sees the message left the input and landed in the thread, then moves on.

macos-use vs. AppleScript-based MCP servers

If you've tried steipete/macos-automator-mcp, peakmojo/applescript-mcp, or any other osascript wrapper, here's what changes.

Feature	AppleScript MCPs	macos-use
What the AI gets	Free-form text from `osascript`. The agent has to know the right script for every app.	Live accessibility tree with roles, labels, and coordinates. Same data Apple gives VoiceOver.
App coverage	Only apps that ship a real AppleScript dictionary. Most modern apps don't.	Every app macOS can describe via AX, including Electron apps, browsers, settings panels.
How clicks happen	AppleScript `click button` calls, often blocked by sandboxing. Many apps just refuse.	Native CGEvent at the OS level. Indistinguishable from a real user, works everywhere.
Failure mode	Cryptic AppleScript errors, the agent retries blind.	Diff response shows exactly which element changed. Agent can self-correct.
Auth & runtime	AppleScript runs in the host app's permission scope, hard to reason about.	One Swift binary over stdio. Local, open source, pinnable npm version.

AppleScript still wins for a handful of legacy automation tasks (Finder folder actions, Mail rules). macos-use stays out of those lanes; everything else, the AX tree is just a better data source.

macos-use vs. Claude computer use & other screenshot agents

Anthropic's computer-use beta, OpenAI Operator, and most desktop agents in 2026 ground every decision in pixels. On a hosted VM that's the only option. On your real Mac, the accessibility tree is a faster, cheaper, more reliable substrate.

Feature	Claude computer use / screenshot agents	macos-use
How it sees the UI	Screenshot + OCR / vision model	Accessibility tree with roles and coordinates
Token cost per action	Full screen re-described every step	Diff-only: elements added / removed / changed
Latency per click	1-3s per step (vision inference + re-screenshot)	~300ms per click + tree diff
Click targeting	Pixel guess from screenshot	Exact coords from tree, or element text match
Input injection	Simulated keystrokes via vision loop	CGEvent, indistinguishable from real user input
Cross-platform	Yes (universal — pixels everywhere)	macOS only (uses native AX). Pair with Terminator on Windows.
Setup	Electron/Docker/Python stack	One Swift binary + stdio MCP
Where it runs	Often hosted SaaS	100% local on your Mac

Screenshots still matter for apps that expose no accessibility tree, and Claude's computer use is the right call when you need a single agent that works the same on macOS, Windows, and Linux. macos-use captures windows on demand so you can combine both when you need to.

Pick Claude computer use when

You need the same agent to run on macOS, Windows, and Linux.
The target app is a custom-rendered canvas (most games, some Electron builds) with no AX tree at all.
You're fine paying ~1-3s and a vision-model call per click for universal compatibility.

Pick macos-use when

You're on macOS and want sub-second clicks with no vision-model bill.
You want the agent to click by text (“Send”, “Submit”), not by guessed pixel coordinates.
You want everything local: one Swift binary over stdio, no SaaS, no Docker.
You already speak MCP and want a tool Claude can pick up next to your other servers.

You can run both. Anthropic's own guidance is that Claude tries MCP tools first and falls back to screen control only when no better integration exists, so macos-use becomes the fast path and computer use becomes the safety net.

Battle tested in production

Powers Fazm

The same server ships inside Fazm as the screen-control layer for a real, paying-customer product. If it works there, it works for your side project.

Open source, MIT

Every line is on GitHub. Pin a version, fork it, audit the Swift. Local binary over stdio, no network calls from the server itself.

Apps confirmed working

Anything that exposes an accessibility tree, which is most modern Mac apps. Sampling devs run regularly:

XcodeCursorVS CodeSlackMailSafariChromeFinderPreviewSystem SettingsNotesCalendariTermTerminalNotionFigmaDiscordLinear

Electron apps (Slack, Notion, Discord, Figma, Linear, Cursor, VS Code) all expose AX — the tree is richer than people expect.

Honest about the limits

What it doesn't do (and what to do instead)

Every accessibility-driven tool has a ceiling. Here's where macos-use hits one, and the workarounds we and Fazm actually use in production.

Limit

Apps with no accessibility tree

A handful of Electron apps strip AX. Most games are custom-rendered with no semantic elements at all.

What to do

Pass raw coordinates to click_and_traverse. The tool still injects native CGEvent clicks even when the tree is empty.

Limit

Cross-window drag & drop

Single-window drags work. Drag-and-drop across two windows (Finder → app, Safari → Notes) is brittle because AX loses the source mid-gesture.

What to do

Use copy/paste plus keyboard focus, or the Services menu. Both are fully covered by the existing six tools.

Limit

No record/replay API yet

You can't capture a session and replay it deterministically. Every run goes through the LLM that called the tools.

What to do

Save your prompt history. The accessibility-tree dumps under /tmp/macos-use/ are enough to reconstruct a run.

Hit something else? Open an issue — the roadmap follows actual users, not a wishlist.

Questions developers ask before installing

Which MCP clients does it work with?

Anything that speaks MCP over stdio. Tested daily with Claude Code, Claude Desktop, Cursor, VS Code (Copilot Chat), Windsurf, Cline, and Zed. Same JSON config, different file path per client.

How is this different from screenshot-based macOS agents?

It reads Apple's native accessibility tree (AXUIElement), so the AI gets structured elements with roles, labels, and coordinates instead of pixels. No OCR, no vision-model tax, no guessing pixel positions. Click by text match (element: "Submit") or exact coordinates from the tree. Responses are diff-only, so after an action you get what changed in the UI, not the whole screen again.

What macOS permissions does it need, and who grants them?

Accessibility permission is granted to the host process (Claude Desktop, Terminal, iTerm, VS Code, whoever spawns the MCP server), not to macos-use itself. That's macOS's TCC model. Screen Recording is needed only if you want window screenshots. Both are revocable from System Settings > Privacy & Security.

Will it click things I didn't approve?

No. Every tool call is gated by your MCP client's approval UI (Claude Code shows a diff-style prompt before each call). During automation, an InputGuard overlay blocks stray keyboard and mouse input so you don't fight the agent. Escape cancels the current action immediately. A 30-second watchdog prevents permanent lockout.

Is this safe to install? Where does the code run?

Fully local. The MCP server is a Swift binary running on your Mac, communicating with your AI client over stdio. No network egress from the server itself. Source is open on GitHub under mediar-ai/mcp-server-macos-use. Pin a specific npm version if you want reproducible installs.

What can't it do yet?

Three real limits: apps with no accessibility tree fall back to coordinate-only clicks, cross-window drag-and-drop is brittle (use copy/paste instead), and there's no record/replay API yet. The 'Honest about the limits' section above walks through each one with the workaround. Open an issue if you hit something not listed.

How do I uninstall?

Remove the entry from your MCP config file and (optionally) npm uninstall -g mcp-server-macos-use. Revoke Accessibility/Screen Recording in System Settings > Privacy & Security > Accessibility by removing the host app.

Ready to try it?

Install with one command. If you're building something bigger on top of it and want the Swift, accessibility, or MCP side tailored to your use case, book 20 minutes with the team.

Book a 20-min call

Drive any Mac app from Claude Code

Copy. Paste. Approve once.

Once it's installed, type things like this

Six tools. Full control.

Why native accessibility beats screenshots

Accessibility tree, not pixels

Click by text

Diff responses

Native event injection

InputGuard + Escape

Cross-app handoff

What Claude actually receives

macos-use vs. AppleScript-based MCP servers

macos-use vs. Claude computer use & other screenshot agents

Battle tested in production

What it doesn't do (and what to do instead)

Apps with no accessibility tree

Cross-window drag & drop

No record/replay API yet

Questions developers ask before installing

Ready to try it?

Keep reading