a different kind of github mcp server

The GitHub MCP server you haven't heard of: a macOS binary that returns a red-crosshair PNG with every response.

Search results for "github mcp server" are dominated by API wrappers: github/github-mcp-server (PRs and issues), mcp-server-git (local clones), the modelcontextprotocol/servers registry. Useful, but they don't touch a GUI. macos-use is a different beast on GitHub. It drives native Mac apps through accessibility, and after every click it returns a screenshot with a red crosshair at the pixel the click landed, next to a flat-text diff of what changed in the window.

M
Matthew Diakonov
10 min read
4.8from macos-use users on GitHub
Six MCP tools. One Swift binary. Zero runtime dependencies.
Annotated PNG + AX-tree diff written to /tmp/macos-use after every call.
screenshot-helper subprocess keeps the server at idle CPU.

The SERP is covering one kind of GitHub MCP server. This is the other.

When someone searches "github mcp server," they almost always land on github/github-mcp-server: GitHub's own Go service that exposes the REST + GraphQL API as MCP tools so an agent can open pull requests, triage issues, inspect CI runs, and read security alerts without a local clone. That server is excellent at what it does. It also has nothing to say about the laptop the agent is running on.

macos-use is on GitHub too. It fills the other half of the picture. The GitHub MCP server drives the platform; macos-use drives the desktop in front of you. If you want an agent that can open a PR on github.com AND verify the resulting branch compiles cleanly in Xcode AND run the binary AND screenshot the result, you need both servers registered in the same client.

github/github-mcp-server (API)+macos-use (accessibility)+mcp-server-filesystem+mcp-server-git+a browser MCP of your choice

What ships back from a single tool call

The github/github-mcp-server returns JSON. Some tools return a compact JSON blob, some return a streaming text response. That is the right shape for repository data. macos-use returns three things for every call that mutates the screen: a short summary in the MCP response, a .txt file on disk containing the full before/after accessibility-tree diff, and a .png of the target window with a red crosshair at the exact pixel the click landed. All three share a timestamp prefix in /tmp/macos-use/.

session replay

The anchor fact: we fork a subprocess to take a screenshot, and the reason why is a CoreGraphics war story.

The version of macos-use you can clone today has a file at Sources/ScreenshotHelper/main.swift whose only job is to be a standalone executable. It takes a Core Graphics window ID, an output path, and optional click coordinates. It calls CGWindowListCreateImage, draws a red crosshair in a CGContext, writes a PNG, and exits. Nothing else. The MCP server binary never calls CGWindowListCreateImage itself.

Why two binaries for one image? Because CGWindowListCreateImage has a side effect that is not documented in the man page and will not show up in a leaks instrument. The first time you call it on an on-screen window, macOS lazily loads ReplayKit, the same framework that powers the screen recording button in Control Center. ReplayKit then keeps a background thread alive in your process at roughly 19% CPU, forever. You cannot unload the framework from a running process. You cannot ask ReplayKit to stop. Your only option is to let the process that loaded ReplayKit die.

For a one-shot CLI, that is fine; the process exits in seconds. For a long-running MCP server that is connected to Claude Desktop or Cursor for hours at a time, it is a regression in idle CPU from zero to 19%. So we don't let ReplayKit into our address space.

The naive approach. Breaks immediately.

in-process.swift (do not ship)

The shipped approach. Idle CPU stays at zero.

Sources/MCPServer/main.swift
0%Parent CPU pre-fix
0%Parent CPU post-fix
0msHelper poll interval
0sHelper timeout cap
19%

First time you call CGWindowListCreateImage on an on-screen window, macOS loads ReplayKit and keeps a background thread open for the life of the process.

Why we spawn screenshot-helper as a child. Sources/MCPServer/main.swift:382-385

What the fork looks like in code

Two short listings. The first is the parent side: how the server decides to invoke the helper, what arguments it passes, and how it times the helper out if ReplayKit decides to hang on first load. The second is the child side: the actual CGWindowListCreateImage call, the CGContext math that places the crosshair in image-local coordinates, and the PNG write.

Sources/MCPServer/main.swift:456
Sources/ScreenshotHelper/main.swift:45

End-to-end, in sequence

One tool call, four participants. Client, server, helper, app. The server does the accessibility work in-process. The helper does the pixel work out-of-process. The app never knows the difference.

click_and_traverse

MCP clientMCPServerscreenshot-helperTarget appclick_and_traverse { x, y }traverse AX tree (before)CGEventPost mouseDown + mouseUptraverse AX tree (after)fork(windowID, path, --click x,y)CGWindowListCreateImagestdout: /tmp/macos-use/ts_tool.png{ summary, filepath, screenshot }

Where it lives in the ecosystem

The GitHub MCP server is the API hub. macos-use is the desktop hub. Register them alongside the filesystem, git, and browser servers in the same MCP client; each one handles a different substrate, and the overlap is tiny.

one MCP client, many substrates

modelcontextprotocol/servers
github/github-mcp-server
mcp-server-git
mcp-server-filesystem
your MCP client
Xcode
Messages
System Settings
Mail
Finder
Notes

When you pick macos-use vs. github/github-mcp-server

Feature-wise there is almost no overlap. Where github/github-mcp-server moves bytes through the GitHub API, macos-use moves mouse coordinates through the accessibility API. The table below is here mostly so you can show a teammate why you need both.

Featuregithub/github-mcp-servermacos-use
Drives a local GUI app (Messages, Xcode, Notes)no, API onlyyes, via AXUIElement
Returns an annotated PNG with every tool callno, JSON onlyyes, red crosshair at click point
Opens pull requests on github.comyes, first-class toolno, not its job
Reads issue comments and CI logs via RESTyesno
Permissions requiredGitHub PAT or OAuthAccessibility in System Settings
Works with no networknoyes
Writes a .txt diff of what changed after each callnoyes, /tmp/macos-use/<ts>_<tool>.txt
RuntimeGo binary + HTTPS clientnative Swift binary

Concretely, when you'd reach for this server

You're clicking buttons no API exposes

System Settings panes, Xcode signing popovers, Preview's PDF markup, Finder quick-look. macOS has thousands of UI surfaces with zero scripting API. Accessibility is the only contract.

You want a visual receipt, not a guess

Every response ships a .png with a red crosshair at the exact pixel the click landed. Read the PNG, verify the action, move on. No inference from AX text labels.

You need a greppable history

Both PNGs and .txt diffs land in /tmp/macos-use with monotonic timestamp prefixes. grep 'AXButton' across the session history and you have a replay log.

You're gluing macOS to the GitHub MCP

Have the GitHub MCP open a PR. Have macos-use verify Xcode picked up the new branch. Have macos-use run the app. Have the GitHub MCP comment on the PR with the result.

By the numbers: 0 tools, 0 binaries, 0% residual CPU

The repo compiles to two executables: MCPServer (the daemon Claude or Cursor talks to) and screenshot-helper (the short-lived child that takes the PNG). The MCP surface is six tools: open_application_and_traverse, click_and_traverse, type_and_traverse, press_key_and_traverse, scroll_and_traverse, refresh_traversal. No plan tool, no sequence handle, no server-persisted agent state; the diff is the state.

Why so few tools?

Every extra tool is an extra schema an LLM has to keep in its context. The six we ship are the smallest set that covers open, act, and observe on a Mac. The interesting flexibility lives inside click_and_traverse, which takes optional text and pressKey params so click + type + press can fire in one round trip and be cancelled by a single Esc.

Install it alongside your GitHub MCP server

four steps, five minutes

1

Clone and build

Swift Package Manager. One command. No Node, no Python, no Docker. Produces a binary plus a helper binary in .build/release.

build.sh
2

Grant accessibility permission

System Settings > Privacy & Security > Accessibility. Add the built binary. Without this, AXUIElementCopyAttributeValue returns kAXErrorCannotComplete on every call.

3

Register with your MCP client

Point Claude Desktop, Cursor, or any MCP host at the MCPServer binary over stdio. Six tools appear: open, click, type, press_key, scroll, refresh.

~/.config/claude/claude_desktop_config.json
4

Call a tool and check /tmp/macos-use

Every response includes a file path and a screenshot path. Read both. The diff tells you what changed in the accessibility tree; the PNG tells you what a human would have seen.

Gluing the GitHub MCP server to macos-use?

Walk us through your agent loop; we'll show you the tool-composition patterns that avoid LLM re-planning round-trips.

Frequently asked questions

Is macos-use a fork of github/github-mcp-server?

No. github/github-mcp-server is Go, hosted by GitHub, and wraps the REST + GraphQL API so an agent can open PRs, triage issues, and inspect CI without a local clone. macos-use is Swift, self-hosted, and wraps the macOS Accessibility APIs (AXUIElement) plus CoreGraphics event posting so an agent can drive any Mac app that renders an accessibility tree. Same protocol, unrelated problem. You can and should run both in the same MCP client.

Why is there a subprocess called screenshot-helper in the repo?

Because CGWindowListCreateImage has a side effect. The first time you ask for an on-screen window bitmap, macOS lazily loads ReplayKit (the same framework that powers screen recording in Control Center), and ReplayKit then holds a background thread open in your process at ~19% CPU for the life of the process. For a long-running MCP server that runs a week, that's unacceptable. The fix is at Sources/MCPServer/main.swift:458: we spawn Sources/ScreenshotHelper/main.swift as a separate executable, let it call CGWindowListCreateImage inside its own address space, let it draw the red crosshair with CGContext, let it write PNG bytes to disk, and let it exit. ReplayKit dies with it. The parent server stays at idle CPU.

What exactly is in /tmp/macos-use/ after a single MCP call?

Three artifacts. A timestamped .txt file with the full accessibility-tree diff (one element per line, prefixed with + for added, - for removed, ~ for modified, filtered to remove AXScrollBar noise). A .png with the same timestamp prefix, showing the target window with a red crosshair + circle drawn at the click point in screen-local coordinates (Sources/ScreenshotHelper/main.swift:70-85). And on the next call, both files stay there until you clean them up — history is flat and greppable.

Does macos-use show up in the GitHub MCP registry?

The official registry at github.com/mcp lists servers that GitHub hosts or has reviewed. macos-use is a community server published under its own GitHub repo, so it is not in that list. The MCP client side does not care: if you can run the binary, you can register it in Claude Desktop, Cursor, or any MCP host with a stdio transport. The README in the repo has the config snippet.

Why Swift? Why not Node or Python like most MCP servers?

Because AXUIElement and CGEvent are Core Foundation APIs, and every non-Swift bridge pays a serialization tax or a bridging-library maintenance cost. A Node MCP server that wants to click a button on another app has to cross a C bridge per call. The Swift binary posts a CGEvent directly from inside the MCP tool handler. The ReplayKit-in-screenshot problem is also a Swift / AppKit problem; on Node you would have spawned a Swift subprocess anyway. We cut the intermediate layer.

Does the PNG show my real screen, with real data? Is that a privacy concern?

Yes and yes, handle it accordingly. The PNG is whatever the target window is rendering when the tool call happens, including any personal content in it. It is written to /tmp/macos-use/ with the permissions of the user running the MCP server, and it is never sent anywhere by the server itself — only the path is returned in the tool response. If your MCP client uploads the file to a model, that is a client-side decision. For sensitive work we recommend pointing /tmp/macos-use at an encrypted volume or adding a post-response cleanup hook on the client.

How does the click point end up on the PNG if screenshots and AX coordinates differ?

They don't, on this machine. macos-use runs on a setup where backingScaleFactor = 1.0 on all screens, so one AX point equals one CGEvent point equals one pixel. The helper receives the click point in screen coordinates (CGEvent space) and the target window's bounds. It computes local coordinates as (click.x - window.origin.x, click.y - window.origin.y), scales to the captured image size, flips Y for CoreGraphics bottom-left origin, and draws. See ScreenshotHelper/main.swift:53-79. If you run on a Retina display, scaleX and scaleY in that block absorb the difference.

What other MCP servers do I need alongside macos-use to cover 'everything on my laptop'?

Pair it with github/github-mcp-server (GitHub platform), modelcontextprotocol/servers/filesystem (files on disk), mcp-server-git (local git repos), and a browser MCP if you need the web. macos-use covers native macOS apps — Messages, Mail, Xcode, Notes, Finder, System Settings, basically anything that shows up in the accessibility tree. The overlap with the others is small: the accessibility layer is a different substrate than HTTP APIs or filesystem reads.

macos-useMCP server for native macOS control
© 2026 macos-use. All rights reserved.