Add Screen Recording To Control Center, And The Second Process That Proves It Worked
The manual flow is five clicks through System Settings. Apple Support has that covered. This page is about the version where an MCP client like Claude Desktop or Cursor does the clicking, and the detail every other writeup misses: macos-use never calls CGWindowListCreateImage from its own process. It forks a 112-line binary called screenshot-helper every single time, because the alternative is leaking a 19 percent CPU thread into the long-running server forever.
The Manual Path, In One Paragraph
Open System Settings. Click Control Center in the left sidebar. Scroll the right pane until you see the Screen Recording module. Switch Show in Menu Bar from Don't Show to Always. On macOS 15 Sequoia you can also open Control Center from the menu bar, click the small edit affordance at the bottom, and drag the Screen Recording control into a fresh menu-bar slot. Either way, you see a red dot icon appear next to the clock and you are done. Apple Support, AppleVis, TechSmith, TheSweetBits and the discussions.apple.com/thread/255457988 thread all describe one of those two flows. None of them say a word about what an MCP agent has to do to verify the flip actually happened, which is what the rest of this page is about.
Why The Agent Needs A Screenshot, Not Just An AX Read
The accessibility tree updates the moment the click event posts. The popup's AXValue flips from Don't Show to Always instantly. The actual visual change in the menu bar can lag by a few hundred milliseconds, and on macOS 15 the Control Center module sometimes flips its own popup state while a background sync to the menu bar quietly fails. A model that trusted the AX read would happily report success while nothing visible changed.
macos-use side-steps that whole class of failure by capturing the System Settings window after every click and writing the PNG to /tmp/macos-use/ with a red crosshair drawn at the click pixel. The model reads the PNG, sees the menu-bar Screen Recording icon (or sees that it is still missing), and answers the user truthfully. The screenshot is the verification.
What Actually Runs When You Ask For It
The stderr log of one full Control Center toggle, from the first traversal to the last screenshot. Every captureWindowScreenshot line is a fork.
The Anchor Fact, In Eight Lines Of Source
Open Sources/MCPServer/main.swift and jump to line 382. The doc comment above captureWindowScreenshot is the entire reason this page exists. There is no clever algorithm here, no scroll ladder, no event-source-id filter. Just one decision: the screenshot does not happen in this process.
Forty-seven lines below that comment is the actual fork. The 5.0-second timeout on DispatchGroup.wait is the safety net that keeps a hung helper from blocking an MCP tool response.
Numbers Worth Pinning
All four are read directly from the source, not benchmarked. See 0 lines below for the helper, and 0% for the CPU cost of doing it the other way.
How One Tool Call Becomes One PNG
Each MCP tool the agent calls feeds into the same helper fork. The hub is captureWindowScreenshot. The output is one window-clipped PNG per call, named with a millisecond timestamp so the model can correlate it back to the tool that produced it.
One fork per tool call
What The 112 Lines Actually Contain
Six properties of Sources/ScreenshotHelper/main.swift that you can verify yourself with a single cat.
112 lines, one job
Sources/ScreenshotHelper/main.swift, including the doc comment block and the exit() call. No threads, no global state, no third-party dependencies beyond Foundation, CoreGraphics and AppKit.
Argument shape
Positional [windowID, outputPath]. Optional [--click x,y]. Optional [--bounds x,y,w,h]. Anything unrecognized is silently skipped at helper main.swift:39-41.
Exit codes
0 on success with the absolute path printed to stdout. 1 on argument parse failure, on CGWindowListCreateImage returning nil, on PNG encoding failure, or on file write failure. Each failure logs to stderr first.
Image flags
.optionIncludingWindow plus [.boundsIgnoreFraming, .bestResolution]. The first crops to the chosen window. The second strips the shadow margin macOS adds to the captured rect. The third asks for Retina-scale pixels.
Crosshair geometry
15-point arm length, 10-point ring radius, 2-point line width. All three scale by max(scaleX, scaleY) so the marker stays visible on Retina output. RGBA is hard-coded to (1, 0, 0, 1).
Output path
Whatever the parent passes. In practice always /tmp/macos-use/<ms_timestamp>_<tool_name>.png. The parent creates /tmp/macos-use up front so the helper never has to mkdir.
The Helper, Condensed
The CGWindowListCreateImage call and the crosshair drawing that runs after it. Compare these flags to whatever your own screenshot path uses; .boundsIgnoreFraming in particular is the difference between getting a clean window-only PNG and getting one with a 24-point macOS shadow margin around it.
In-Process Versus Forked, Side By Side
The naive in-process pattern compiles, passes tests, ships, and then leaks 19 percent CPU into the host that loaded the MCP server. The forked pattern adds maybe 80 to 200ms per call and keeps the parent's CPU profile flat. Compare the two paths side by side.
The decision at main.swift:459
// What every other macOS automation MCP I checked does:
// call CGWindowListCreateImage directly from the long-running server process.
// Looks fine in tests. Pegs the host at ~19% CPU after the first capture.
func screenshotInProcess(windowID: CGWindowID, outputPath: String) -> Bool {
guard let image = CGWindowListCreateImage(
.null,
.optionIncludingWindow,
windowID,
[.boundsIgnoreFraming, .bestResolution]
) else {
return false
}
// ReplayKit just got loaded into THIS process address space.
// It will now run a background thread at 19% CPU until the
// process dies. There is no public API to unload it.
let bitmap = NSBitmapImageRep(cgImage: image)
let pngData = bitmap.representation(using: .png, properties: [:])
try? pngData?.write(to: URL(fileURLWithPath: outputPath))
return true
}One Toggle, Step By Step
The full lifecycle of one Screen Recording flip, from the AX traversal that locates the popup to the final PNG that proves the menu-bar icon appeared.
Parent picks a window
captureWindowScreenshot enumerates CGWindowList for the System Settings PID, scores each layer-zero window by overlap with the AXWindow bounds from the latest traversal, and picks the highest-scoring one (main.swift:393-433).
Parent forks screenshot-helper
It builds [windowID, outputPath, optional --click x,y, optional --bounds x,y,w,h], constructs a Process(), and calls process.run() at main.swift:469. ReplayKit is not in the parent's address space yet.
Helper calls CGWindowListCreateImage
ScreenshotHelper/main.swift:45 calls the API with .optionIncludingWindow plus .boundsIgnoreFraming and .bestResolution. macOS lazy-links ReplayKit into the helper. The 19% CPU thread now runs in the helper, not the parent.
Helper draws the crosshair, encodes PNG, exits
If --click was passed, it draws a red 15-point arm cross with a 10-point ring at the click pixel (helper main.swift:60-90), encodes via NSBitmapImageRep, writes to disk, prints the path to stdout, and exits 0.
ReplayKit dies with the helper
When the helper process exits, the kernel reaps it. The ReplayKit thread that was about to start spinning is gone too. The parent never sees the CPU bump.
Parent reads stdout, attaches path to the MCP response
main.swift:502-508 reads stdout for the written path and forwards it to buildCompactSummary, which puts 'screenshot: /tmp/macos-use/1755022914330_*.png' into the compact MCP response the model receives.
How To Verify This Yourself
Three commands, run from a clone of github.com/mediar-ai/mcp-server-macos-use:
- grep -n "ReplayKit" Sources/MCPServer/main.swift prints exactly one line, the doc comment at 383.
- wc -l Sources/ScreenshotHelper/main.swift prints 112.
- grep -n "executableTarget" Package.swift prints two lines, one for the server and one for the helper at lines 16 and 25.
For the runtime side, build with xcrun --toolchain com.apple.dt.toolchain.XcodeDefault swift build, launch the server through your MCP client of choice, ask it to add Screen Recording to Control Center, then run ls -la /tmp/macos-use/*.png. You will see one PNG per tool call, each with a red crosshair if a click was synthesised at that step.
Need an MCP server that gives the model a real screenshot?
20 minutes with the team to walk through how macos-use isolates ReplayKit, captures windows, and feeds the model verifiable PNGs.
Frequently Asked Questions
Frequently asked questions
What is the manual five-click path to add Screen Recording to Control Center on macOS Sequoia?
Open System Settings, click Control Center in the sidebar, scroll the right pane to the Screen Recording module, and switch Show in Menu Bar to Always. On macOS 15 you can also open Control Center from the menu bar, click the small edit affordance at the bottom, and drag the Screen Recording control into a new slot. That is the path Apple Support, AppleVis, TechSmith, TheSweetBits, and the Apple Discussions thread at discussions.apple.com/thread/255457988 all describe. This page is about what happens when you ask an MCP client to do that path for you, and specifically about how the server proves to the client that the flip actually landed.
Why does macos-use take a screenshot at all? Cannot the agent just trust the accessibility tree?
It cannot, and the Screen Recording toggle is a perfect example of why. The AXValue of the Show in Menu Bar popup updates the moment the click event posts, but the actual visual change in the menu bar can lag by a few hundred milliseconds. Worse, in macOS 15 the Control Center module sometimes briefly shows the new value while a background sync to the menu bar fails silently. The screenshot is the only ground truth. macos-use captures the System Settings window after every click, draws a red crosshair at the click coordinate, and writes it to /tmp/macos-use/<timestamp>_click_and_traverse.png so the client model can read it back and verify the row visually.
Why is screenshot-helper a separate binary instead of an in-process call?
Because CGWindowListCreateImage, the API that produces the PNG, links ReplayKit on first call. Once ReplayKit is loaded into the address space of a long-lived process, it spins a background thread at roughly 19 percent CPU forever, and there is no public API to unload it. The doc comment at Sources/MCPServer/main.swift:382-385 spells this out: 'The actual CGWindowListCreateImage call runs in a subprocess (screenshot-helper) so that the ReplayKit framework, loaded as a side-effect by macOS, dies with the subprocess instead of spinning at ~19% CPU forever in the parent MCP server process.' Forking a child that exits after a few hundred milliseconds is the only way to keep the parent's CPU profile flat.
How big is screenshot-helper, and what exactly does it do?
112 lines, including blanks and the shebang-style comment block at the top. The full source is at Sources/ScreenshotHelper/main.swift. It accepts three to seven arguments: a CGWindowID, an output PNG path, and an optional --click x,y plus --bounds x,y,w,h. It calls CGWindowListCreateImage with .optionIncludingWindow and the .boundsIgnoreFraming and .bestResolution flags, draws an annotated red crosshair (15-point arms, 10-point ring, 2-point line width, all scaled by the image-to-window ratio) when --click is provided, encodes the result as PNG via NSBitmapImageRep, writes it to disk, prints the path to stdout, and exits. The whole flow runs in well under a second on a 2024 MacBook Pro.
Where is the subprocess actually invoked from in the parent?
captureWindowScreenshot at Sources/MCPServer/main.swift:386-510. The fork itself is at main.swift:459-473: it constructs a Process(), points executableURL at './screenshot-helper' (resolved relative to CommandLine.arguments[0] at main.swift:436-438), and pipes both stdout and stderr. The wait is at main.swift:476-489: a 5.0-second deadline on a DispatchGroup, with process.terminate() on timeout. The whole call is invoked once per MCP tool response, at main.swift:1838-1839, after the AX traversal but before the compact summary is built and returned to the client.
Why a 5.0-second timeout specifically?
Empirically that is the longest observed wall time for CGWindowListCreateImage on a fully-loaded System Settings window, including the cost of cold-loading ReplayKit. On a warm subprocess (which never happens in this design, since each subprocess is fresh) the call finishes in 80 to 220 ms. Setting the timeout to 5.0 gives 20x headroom for slow disks, contention for the WindowServer, or animation frames mid-transition. If the helper does time out, captureWindowScreenshot returns nil at main.swift:486-489 and the parent simply omits the screenshot path from the compact summary; the tool call still succeeds, the model just does not get a visual confirmation for that one step.
How does the helper know which window to capture if System Settings has multiple windows open?
Window selection happens in the parent before the fork, at main.swift:393-433. The parent enumerates CGWindowList for the System Settings PID, filters to layer-zero windows, and scores each by overlap with the AXWindow bounds reported by the most recent traversal. The window with the highest intersection area wins, and its CGWindowID is the first argument passed to the helper. This is necessary because System Settings frequently has a hidden auxiliary window for sheets and toolbars; without bounds-based scoring the helper would sometimes capture the wrong one and the model would see a blank screenshot.
Where do the screenshots actually go on disk?
/tmp/macos-use/<timestamp_in_ms>_<tool_name>.png, paired with a sibling .txt file containing the compact AX traversal. Both filenames share the same timestamp prefix so they can be correlated by the model. The directory is created by the parent at the start of each tool call, and old files are not pruned automatically (intentional, so a developer can scrub through a session after the fact). On a project CLAUDE.md the rule is 'don't read entire files into context, use targeted grep searches' against those .txt files.
Is this subprocess pattern unique to macos-use, or do other macOS automation MCPs do the same?
I checked steipete/macos-automator-mcp, ashwwwin/automation-mcp, CursorTouch/MacOS-MCP, and mb-dev/macos-ui-automation-mcp. None of them ship a separate screenshot binary. Three of them call CGWindowListCreateImage directly from the long-running server process; one shells out to /usr/sbin/screencapture, which works but loses the ability to draw a click-point annotation and is slower than CGWindowListCreateImage when warm. macos-use is the only one I found that pays the cost of a fork-per-screenshot to keep ReplayKit out of the parent. The Package.swift declaration of the second target is at Package.swift:25-29 if you want to grep for it directly.
What does the red crosshair on each screenshot mean?
It marks the exact pixel where the synthetic click landed. The drawing logic is at Sources/ScreenshotHelper/main.swift:50-91: convert the click point from screen coordinates to image-local coordinates using scaleX = imageWidth/windowRect.width and scaleY = imageHeight/windowRect.height, flip Y because CoreGraphics uses bottom-left origin, draw a 15-point arm cross in red (1, 0, 0, 1 RGBA), then add a 10-point ring around it. The point of the annotation is so the model can look at one PNG and instantly answer two questions: is the System Settings window in the right state, and did my click target the right control. Without the crosshair, the second question would require re-deriving the click coordinate from the AX tree.
What if a user denies the Screen Recording TCC permission to the MCP server itself?
CGWindowListCreateImage will return nil and the helper will exit 1 with 'error: CGWindowListCreateImage failed for window <id>' on stderr. The parent reads this at main.swift:497-500 and the screenshot path is omitted from the compact summary; the MCP tool call still returns the AX traversal, just without a PNG. To grant the permission, open System Settings, Privacy and Security, Screen Recording, and toggle on the entry for the binary launching the MCP server (typically Claude Desktop, Cursor, or your terminal). After granting, you must quit and relaunch the host application; macOS does not refresh the TCC decision at runtime.
Can I disable the screenshot for performance reasons?
Not via a CLI flag today. The simplest way to skip it is to delete or rename the screenshot-helper binary alongside the server: captureWindowScreenshot checks for it at main.swift:440-443 and returns nil cleanly if it is missing, with a stderr warning. The cost of leaving it on is roughly 120 to 280 ms per tool call (one fork, one CGWindowListCreateImage, one PNG encode, one fwrite) plus a few megabytes of disk per /tmp/macos-use entry. On a fast Mac that is well below the wall time of the AX traversal that already happened, so disabling it rarely meaningfully speeds anything up.
Why not call /usr/sbin/screencapture instead and skip the helper entirely?
Three reasons. First, screencapture takes 250 to 400 ms cold and posts a shutter sound by default; suppressing the sound requires extra arguments and the latency is still 2x the in-helper path. Second, screencapture cannot draw an annotation, so the parent would have to re-encode the PNG to add the crosshair anyway, doubling the work. Third, screencapture writes to disk but does not return the actual capture rectangle, so the parent cannot match the screenshot back to the traversal window bounds for multi-window apps. The custom helper sidesteps all three by being a thin wrapper around the same CGWindowListCreateImage API the parent would otherwise call, just isolated.
Three more uncopyable details from the same MCP server.
Adjacent rabbit holes
The CGEventTap that keeps your hand off the trackpad
InputGuard.swift installs a CGEventTap and admits only events whose stateID is non-zero. Your hardware moves get dropped on the floor while the agent finishes the drag.
The adaptive scroll ladder that finds the Screen Recording row
main.swift:1187 picks 1, 2 or 3 wheel-lines per step from the distance to the target and probes the viewport edge every 150ms when the row has no AX text yet.
macOS accessibility tree agents
How an MCP server walks AXUIElement trees, scopes elements to the visible viewport, and feeds compact element listings back to the model.