Swift MCP servertwo-binary architectureReplayKit isolation

macos-use Ships A Second Binary Whose Whole Purpose Is To Die After One Screenshot

CGWindowListCreateImage is not free. The first call in any process lazy-loads ReplayKit, and ReplayKit spawns a background worker that does not stop. In a short CLI that runs for 40ms nobody notices. In a long-lived MCP server that sits in Claude Desktop's menu bar for hours, top shows ~19% CPU forever. The fix is architectural, not tuned: macos-use declares a second .executableTarget in Package.swift, hands every capture request to that subprocess via Process() with a 5-second deadline, reads the saved PNG path off stdout, and lets the helper die. ReplayKit goes with it.

Matthew Diakonov, Written with AI

Published April 18, 20269 min read

Read ScreenshotHelper/main.swift on GitHub Open captureWindowScreenshot at main.swift:378

5.0from open source

Two executableTarget declarations in Package.swift (main server + screenshot-helper)

Every capture runs in a child process with a 5-second watchdog at main.swift:485-489

ReplayKit CPU leak is contained to the subprocess lifetime, not the server lifetime

What is macos-use

A local MCP server that gives Claude Code, Cursor, and Claude Desktop real hands on macOS.

Native accessibility-tree automation for any Mac app: click, type, scroll, traverse. No screenshots-as-input, no OCR, no vision-model tax. Open source, MIT licensed, runs entirely on your machine. The two-binary screenshot architecture documented on this page is one reason it stays cheap to keep running in the background of a long agent session.

macOS 13+ • Swift • stdio MCP • mediar-ai/mcp-server-macos-use

Install in your MCP client. Same npx entry point everywhere:

Run once, from any terminal:

claude mcp add macos-use -- npx -y mcp-server-macos-use

Restart the client after editing the config. Grant Accessibility (and Screen Recording, if you want screenshots) to the host app from System Settings → Privacy & Security.

One screenshot. One subprocess. One death.

How macos-use keeps its MCP server out of the ReplayKit CPU trap

CGWindowListCreateImage lazy-loads ReplayKit the first time it's called

ReplayKit spawns a worker that never stops, ~19% CPU forever

macos-use moves that one call into a sibling binary, screenshot-helper

Main server spawns the helper, waits 5 seconds max, reads the PNG path

Helper exits. ReplayKit dies with it. Parent CPU returns to idle.

0:00 / 0:05

The Architectural Detail Every Competing macOS MCP Skips

Search the keyword macos-use and read the top results. The GitHub READMEs for steipete/macos-automator-mcp, ashwwwin/automation-mcp, CursorTouch/MacOS-MCP, digithree/automac-mcp, and mb-dev/macos-ui-automation-mcp each describe a list of tools: click, type, press, launch an app, capture the screen. Not one of them discusses the lifetime of the framework that backs the capture call, which matters because CGWindowListCreateImage pulls ReplayKit in as a side effect, and ReplayKit does not respect the caller's idea of a one-shot.

If you deliver macOS automation through a long-running MCP server, and your users are running a model that fires a tool every few seconds for an hour, the ReplayKit worker accumulates its own CPU bill across every call. Battery-powered laptops heat up. Fans kick on. The user blames the model. The only way to drop that cost to zero between tool calls is to put the capture in a process you can actually kill, and the only way to ship that reliably is to have a second binary next to your main one.

macos-use does exactly that. Its Package.swift is the shortest proof: two .executableTarget entries, one called mcp-server-macos-use, the other called screenshot-helper. No other macOS MCP I found has a second target.

What Crosses The Subprocess Boundary

Three inputs go into screenshot-helper: the window ID chosen by the intersection scorer, the output path under /tmp/macos-use/, and optional click annotation data. Two outputs come back: the saved PNG path on stdout and any diagnostic logs on stderr, forwarded verbatim into the parent's log stream.

argv in, stdout PNG path out, ReplayKit stays boxed inside

Anchor code 1 of 3

Two Executables, One Package Manifest

Every macOS MCP I checked declares a single executable target. macos-use declares two. The reason is one line of comment below: ReplayKit is loaded as a side effect of the capture call and cannot be unloaded in-process.

Package.swift

Build both with swift build -c release. Both land in .build/release/. The main server finds the helper by walking up from CommandLine.arguments[0], so they only have to be siblings on disk.

Inline Capture Versus Subprocess Capture

Left: the in-process pattern every other macOS MCP ships. Right: what macos-use does instead. The right side is longer by about a dozen lines, but the right side's steady-state CPU cost is zero between tool calls. The left side's is not.

The same API surface, two very different CPU curves

// The shape other macOS MCP servers use.
// captureAndReturn() is called over and over inside the same process.
// ReplayKit loads on the first call and stays resident.

func captureAndReturn(pid: pid_t, out: String) -> String? {
    guard let winID = findBestWindow(for: pid) else { return nil }

    // First call here lazy-loads ReplayKit into THIS process.
    // Every subsequent call reuses the already-loaded framework.
    // The background worker never stops. Parent CPU stays pinned.
    guard let image = CGWindowListCreateImage(
        .null, .optionIncludingWindow, winID,
        [.boundsIgnoreFraming, .bestResolution]
    ) else { return nil }

    writePNG(image, to: out)
    return out
}

// In a long-lived MCP server this path accumulates cost forever.
// top -pid <server-pid> shows ~19% CPU usage in steady state.

-16% fewer lines

The Full Round-Trip Of A Single Screenshot

Five actors. Five messages. The most expensive call on the whole path (CGWindowListCreateImage) never enters the MCP server's process.

click_and_traverse ending in a PNG the model can read

Anchor code 2 of 3

How The Parent Launches, Waits, And Reads

Three Swift idioms you can read at a glance: Process() for the spawn, DispatchGroup plus a timeout for the 5-second watchdog, and a pipe read for the result. No shared memory. No IPC beyond argv plus stdout.

Sources/MCPServer/main.swift

What You See In The Log While The Helper Is Alive

Every log line below is a literal string emitted by the server. The four "captureWindowScreenshot" lines always appear in this order around a successful capture: window selection, argument construction, spawn, success.

mcp-server-macos-use (stderr during one tool call)

Anchor code 3 of 3

The Helper Itself Is 111 Lines, And Most Of Them Are Crosshair Math

The entire capture logic fits on one screen. The line that loads ReplayKit is called exactly once per process invocation, and the process invocation is disposable by construction.

Sources/ScreenshotHelper/main.swift

The actual file includes the red crosshair drawing between the capture and the PNG write; I trimmed that block here because the point is the lifecycle, not the annotation geometry. Read the full file on GitHub for the coordinate transforms.

~19%

“Runs in a subprocess so that ReplayKit (loaded as a side-effect of CGWindowListCreateImage) dies with the process instead of spinning forever in the parent MCP server.”

doc comment at Sources/ScreenshotHelper/main.swift:1-4

Numbers You Can Verify In The Current Commit

Every number below is either a line reference in Sources/MCPServer/main.swift at HEAD or a direct count from Package.swift. Clone the repo, open the file, the code matches.

0executableTarget declarations in Package.swift

0total lines in ScreenshotHelper/main.swift

0ssubprocess deadline before terminate()

0CGWindowListCreateImage call per subprocess

first line of captureWindowScreenshot

line that instantiates Process()

DispatchGroup.wait timeout check

line returning the saved PNG path

Seven Stages From Tool Call To PNG On Disk

Each stage maps to a specific line range. The subprocess is stage four; everything before it happens in the main server, and everything after it happens in the helper.

CGWindowList is filtered by PID and layer == 0

main.swift:388-407 reads CGWindowListCopyWindowInfo with .optionOnScreenOnly plus .excludeDesktopElements, then filters to windows whose kCGWindowOwnerPID matches the target and whose kCGWindowLayer equals 0 (normal app windows, not menu extras or widgets).

Intersection scoring picks the right window

main.swift:412-424 computes score = intersection(traversalWindowBounds, window).area and keeps the max. If no traversal bounds are available, fall back to window.area. The winning CGWindowID is logged to stderr with its score.

Helper path is resolved relative to the server binary

main.swift:436-438 sets helperPath = dirname(CommandLine.arguments[0]) + '/screenshot-helper'. main.swift:440-443 bails cleanly if the sibling binary is missing instead of crashing the server.

argv is assembled: windowID, outputPath, optional --click and --bounds

main.swift:445-454. The click point is in screen coordinates; the bounds describe the chosen window's position so the helper can translate screen coordinates into image-local coordinates.

Process() spawns the helper with piped stdout and stderr

main.swift:459-469. Pipes are created for both streams so the helper's stderr can be forwarded verbatim and its stdout (the saved PNG path) can be read cleanly after exit.

5-second DispatchGroup watchdog enforces a deadline

main.swift:475-489. waitUntilExit() runs on a background queue inside DispatchGroup, and group.wait(timeout:) enforces the 5-second limit. If the helper hangs, process.terminate() is called and the parent returns nil.

stdout is parsed, stderr is forwarded, helper exits, ReplayKit dies

main.swift:491-510. stderr data is printed verbatim into the server's log, stdout is trimmed and returned as the saved PNG path. The helper process is already gone by the time this runs; the ReplayKit worker went with it.

What Breaks If You Collapse The Two Binaries Into One

None of these are theoretical. Each is a direct consequence of the ReplayKit worker staying resident inside a long-lived server process that takes screenshots on every diff-producing tool call.

Battery drain on laptops running a local MCP server

~19% sustained CPU on a single core translates to a measurable hit on battery life. An idle Claude Desktop session should cost nothing. With in-process capture, every screenshot adds to a floor that never returns to zero.

Fans kick on mid-session

Apple silicon laptops are silent below roughly 20% sustained CPU. The ReplayKit worker sits right at that threshold, so agent sessions audibly change the laptop's thermal profile.

Model blamed for server cost

Developers notice their machine getting hot while the model is 'thinking'. The CPU is the MCP server, not the model. Attribution bug.

Framework memory is not reclaimed between tool calls

ReplayKit holds Metal resources, audio capture session scaffolding, and a background queue. None of that unloads. Every subsequent capture reuses it; the memory baseline of the server rises.

No backpressure for a hung capture

In-process, a stuck capture freezes the whole server. In subprocess, main.swift:485-489 enforces a 5-second deadline per capture; the parent stays responsive to MCP traffic even if one call goes bad.

Things the subprocess isolation contains so your parent process does not have to

ReplayKit background worker~19% sustained CPU floorlazy-loaded Metal resourcesframework memory baselinecapture API hang riskfan noise on Apple siliconbattery drain during idlestuck CGWindowListCreateImage callsper-call Retina scaling workNSBitmapImageRep allocationCoreGraphics context for crosshairPNG encode pass

What the two-binary design guarantees

The capture call lives in a process whose entire reason for existing is to exit immediately after it returns
ReplayKit is loaded inside the helper, never inside the MCP server
A hung or crashed helper never takes the MCP server with it (5-second watchdog + piped stderr)
The parent reports which window was captured, with the intersection-scoring value, in stderr before the subprocess launches
Retina scaling is handled inside the helper; the crosshair annotation lands in the right pixel regardless of backing scale
If the helper binary is missing, the server returns cleanly without a screenshot instead of crashing
Only one CGWindowListCreateImage call runs per helper invocation; the process is never reused

Build Both Binaries And See The CPU Curve For Yourself

One swift build produces both binaries into .build/release/. Open Activity Monitor, run any click_and_traverse tool call, and watch the CPU of the main server: it spikes for the capture, then falls immediately back to idle. There is no floor to accumulate.

git clone https://github.com/mediar-ai/mcp-server-macos-use
cd mcp-server-macos-use
xcrun --toolchain com.apple.dt.toolchain.XcodeDefault swift build -c release

ls -1 .build/release/mcp-server-macos-use .build/release/screenshot-helper

# Point Claude Desktop at .build/release/mcp-server-macos-use
# The main server will locate screenshot-helper via sibling
# path resolution, so both binaries must stay in the same
# directory. Restart Claude Desktop, run any click flow, and
# watch stderr in the Claude Desktop MCP log viewer.

Frequently Asked Questions

Frequently asked questions

Why does macos-use ship a second binary called screenshot-helper?

Because CGWindowListCreateImage has a documented side effect on macOS: the first call in a process loads the ReplayKit framework lazily, and ReplayKit spawns an internal background worker that never stops. In a short-lived CLI that is invisible. In a long-lived MCP server that sits in the menu bar for hours, the parent process measures at around 19% CPU usage indefinitely after the first screenshot. The fix encoded in the repo is architectural: keep the capture call in a sibling executable (screenshot-helper) so the ReplayKit worker dies when that subprocess exits. Sources/ScreenshotHelper/main.swift is 111 lines total; Package.swift declares it as a second .executableTarget next to the main server.

Where is the subprocess launch in the main server, exactly?

Sources/MCPServer/main.swift:435-510. The main server finds the helper next to its own binary via (myPath as NSString).deletingLastPathComponent at main.swift:436-438, builds an argv list at main.swift:445-454 (window ID, output path, optional --click and --bounds for the crosshair annotation), launches Process() at main.swift:459-469, enforces a 5-second deadline via DispatchGroup.wait at main.swift:477-489, forwards the helper's stderr verbatim at main.swift:492-495, and reads the saved PNG path off the helper's stdout at main.swift:502-506. The helper is allowed to die after every single capture; nothing is pooled.

What does the helper actually do once it is spawned?

Four things, in order. One: parse argv at Sources/ScreenshotHelper/main.swift:25-42 to pull the window ID, output path, optional click point, and optional window bounds rectangle. Two: call CGWindowListCreateImage with .optionIncludingWindow at ScreenshotHelper/main.swift:45; this is the call that loads ReplayKit. Three: if a click point was passed, open a CGContext at ScreenshotHelper/main.swift:61-90 and draw a red crosshair plus a ring at the click point, translating from screen coordinates to image-local coordinates with scaleX = imageWidth / windowRect.width. Four: write the PNG via NSBitmapImageRep at ScreenshotHelper/main.swift:94-101 and print the output path to stdout so the parent can confirm success. Then exit. ReplayKit goes with it.

Why not keep the call in-process and just ignore the CPU cost?

Because macos-use targets long-lived MCP client sessions. Claude Desktop, Cursor, and Cline keep the server alive for the duration of the session, which is often hours. Every diff-producing tool call (click, type, press, scroll) produces a screenshot, and in-process capture would compound: the first call pins one ReplayKit worker; the thousandth call pins the same one, still burning CPU, still holding framework memory. Battery-powered developer laptops would fan up. Cache locality degrades. A subprocess that dies after every capture costs a process spawn (~10-30ms on Apple silicon) and nothing else. That trade is easy.

How is the right window picked before the helper is even spawned?

Via intersection scoring against the accessibility traversal's window bounds. main.swift:388-425 reads the CGWindowList via CGWindowListCopyWindowInfo with .optionOnScreenOnly plus .excludeDesktopElements, filters to windows whose kCGWindowOwnerPID matches the target PID and whose kCGWindowLayer equals 0 (real app windows, not desktop widgets), and for each candidate computes score = intersection(traversalWindowBounds, windowBounds).area. Highest score wins. If no traversal bounds are available the fallback is largest visible window. The chosen CGWindowID is printed to stderr with its score, e.g. 'log: captureWindowScreenshot: selected window 31704 (score=2073600)', so you can reconstruct which window the helper captured.

What happens if the helper hangs or is missing?

Three layered failures, all logged. One: main.swift:440-443 checks FileManager.default.fileExists(atPath: helperPath) before Process().run() and bails with 'screenshot-helper not found at <path>' if the binary did not ship alongside the server. Two: main.swift:485-489 wraps waitUntilExit() in a DispatchGroup with a 5-second deadline; if wait() returns .timedOut the parent calls process.terminate() and returns nil. Three: main.swift:497-500 checks process.terminationStatus == 0 and logs 'exited with status N' if the helper crashed after launching. In all three cases captureWindowScreenshot returns nil to its caller; the MCP response still includes the traversal and the diff, just no screenshot path.

Does the helper handle Retina scaling and click-point annotation correctly?

Yes. ScreenshotHelper/main.swift:55-58 computes scaleX = imageWidth / windowRect.width and scaleY = imageHeight / windowRect.height, which accounts for any backing-scale difference between the window in screen coordinates and the CGImage the CGWindowListCreateImage call returned. The click point is translated with (clickPoint.x - windowRect.origin.x) * scaleX for horizontal and then Y-flipped at ScreenshotHelper/main.swift:67-68 because CoreGraphics drawing has origin bottom-left while screen coordinates have origin top-left. The crosshair arms are 15 points, the ring radius is 10 points, both scaled by max(scaleX, scaleY) so the annotation looks the same whether the window was captured at 1x or 2x.

How do the two binaries find each other on disk?

The main server computes the helper path at main.swift:436-438: let myPath = CommandLine.arguments[0]; let myDir = (myPath as NSString).deletingLastPathComponent; let helperPath = (myDir as NSString).appendingPathComponent("screenshot-helper"). That is, the helper must be in the same directory as the main executable. swift build places both in .build/<config>/ automatically because Package.swift declares both as .executableTarget. npm packaging and Homebrew bottling both preserve that sibling layout. If you hand-copy the main binary to a custom location and forget the helper, you get the 'screenshot-helper not found' warning and screenshots silently disappear from the response.

How does this differ from how other macOS MCP servers handle screenshots?

steipete/macos-automator-mcp is AppleScript-oriented and does not ship window capture at all; screenshot is a separate concern. ashwwwin/automation-mcp uses a Node addon and calls screencapture via child_process; screencapture is a separate CLI that already spawns a fresh process per call, so the same isolation is accidental rather than architectural. CursorTouch/MacOS-MCP and mb-dev/macos-ui-automation-mcp both use in-process APIs (NSImage+CGWindowList paths or similar) from a long-lived server and do not mention the ReplayKit side effect. macos-use is the only one I found that declares a dedicated executableTarget whose entire reason for existing is to be thrown away after one image.

Can I inspect the subprocess while it is running?

Yes. Before each invocation the main server logs 'launching screenshot-helper for window <ID>' to stderr at main.swift:456. During the roughly 200-500ms the helper is alive you can ps -ef | grep screenshot-helper and see its full argv including the --click and --bounds flags. After exit you can tail the helper's stderr from the main server's log (forwarded verbatim at main.swift:492-495), and you can read the PNG at the output path that came back via stdout. The screenshot file itself is named <timestamp>_<toolname>.png under /tmp/macos-use/, so you can correlate each helper invocation with the tool call that spawned it.

Is there any shared state between the parent and the helper?

Almost none, and that is the point. The parent passes a window ID and an output path via argv, not shared memory. The helper does not call back into the parent, does not read the MCP stream, does not consume any TCC permissions the parent did not already consume. The only shared resource is the filesystem: the helper writes the PNG to the path the parent chose. Because there is no IPC beyond argv plus stdout, a helper crash cannot corrupt parent state and a parent crash does not leave the helper hanging (macOS reaps orphaned processes whose stdin is closed).

What is the shortest way to reproduce the ReplayKit leak without macos-use?

Write a tiny Swift script that calls CGWindowListCreateImage(.null, .optionOnScreenOnly, kCGNullWindowID, [.bestResolution]) once and then runs RunLoop.main.run(). Launch it, wait a few seconds, and run top -pid <pid>. You will see a background thread belonging to ReplayKit pinning CPU. Kill the process, the CPU falls to zero. Now re-run the same script but call the capture inside a child Process() that exits immediately; top will never see the leak. That is the experiment that motivated the two-binary architecture documented at Sources/MCPServer/main.swift:382-385.

Run macos-use in your MCP client today

One npx entry point, works in Claude Code, Cursor, Claude Desktop, VS Code, and Windsurf. The two-binary architecture above keeps it cheap to leave running across a long agent session. MIT-licensed Swift, every line number on this page is stable at HEAD.

Star on GitHub Book a 15-min setup call Open an issue

Read the rest of the source

The two-binary architecture is one of several macOS-specific design choices baked into this server. Browse Sources/MCPServer/main.swift for the rest.

Open the repo on GitHub →