How to add Screen Mirroring to Control Center, and the one function that lets an AI do it without scrolling by hand

Screen Mirroring is already inside Control Center on modern macOS. What people usually mean when they ask to add it is: pin it to the menu bar so a single click opens the picker. That setting lives in System Settings, Control Center, and the row sits below the fold. A naive MCP click at the AX coordinate misses by half a window. The fix is a 125-line function at Sources/MCPServer/main.swift:1159-1285 called scrollIntoViewIfNeeded. Adaptive step size. Text-match verification. 30-step cap. It runs before every click.

M
Matthew Diakonov
9 min read
5.0from open source
scrollIntoViewIfNeeded runs before every click: wired in at main.swift:1588
Adaptive step size: 1 line if <80px, 2 if <250px, 3 otherwise (main.swift:1187)
30-step hard cap so a flaky selector never scrolls forever (main.swift:1189)
Probes viewport edge at 60pt inset when the target has no AX text (main.swift:1231-1233)

First, the manual path. It is shorter than you think.

On macOS Sonoma and Sequoia, Screen Mirroring is already pinned to the Control Center flyout that drops down from the two-stacked-toggles icon in the menu bar. You do not need to add it there. The thing most people mean is pinning a dedicated Screen Mirroring icon into the menu bar as its own top-level item, so one click opens the picker instead of two.

To do that: open System Settings, click Control Center in the sidebar, scroll down to the Screen Mirroring row, and switch Show in Menu Bar to Always or When Active. On a 13-inch MacBook that row lives below the initial viewport of the default System Settings window, so you will scroll through Wi-Fi, Bluetooth, AirDrop, Focus, Stage Manager, and Screen Mirroring rows to reach it.

That is the path every SERP result covers. The interesting version is what happens when you ask Claude Desktop to do it for you and the model picks up the macos-use MCP tools. The naive click posts a CGEvent at the coordinate the AX tree reported, which is below the window frame. The click lands on nothing. The rest of this page is about the function that solves this.

What goes in, what the scroll helper does with it, what comes out

Every click routed through click_and_traverse hits this path. The inputs on the left come from the AX traversal the model already read. The hub is the function at main.swift:1159. The outputs on the right are what the click layer actually posts a CGEvent against.

Routing off-viewport AX coordinates through scrollIntoViewIfNeeded

Raw AX coordinate from traversal (often off-viewport)
Current window bounds (getWindowContainingPoint)
Target text resolved from findAXElementAtPoint
User hand/trackpad (independent channel)
scrollIntoViewIfNeeded at main.swift:1159-1285
Painted center of the element, ready for CGEvent.post
Stderr log: 'found X at (x,y) after N steps'
Fallback: original point if 30-step cap is hit
Click posted at main.swift:1588-1594

What happens between the prompt and the menu-bar icon

Seven steps from the moment Claude Desktop hands off the prompt to the MCP server until the Screen Mirroring icon appears in your menu bar. Step 4 is the one every other macOS-automation project handles badly or not at all.

1

Claude Desktop calls macos-use_open_application_and_traverse

Identifier com.apple.systempreferences. AppOpener at .build/checkouts/MacosUseSDK activates the window via NSRunningApplication.activate, waits for the frontmost PID to flip, and returns the app's full AX tree as a flat .txt under /tmp/macos-use/.

2

Model greps the traversal file for 'Control Center'

The tree is one element per line: role, text, x, y, width, height. The model finds 'Control Center' [AXStaticText] at (120, 238) and feeds that back as the element argument to click_and_traverse.

3

click_and_traverse activates the app and computes the raw click point

At main.swift:1582 the server calls runningApp.activate and sleeps 200ms. Then it centers the match: rawPoint = (matchX + matchW/2, matchY + matchH/2). Still no event posted.

4

scrollIntoViewIfNeeded decides whether to scroll

At main.swift:1588 the raw point passes through scrollIntoViewIfNeeded. If the point is inside the current window bounds the function returns it untouched. For the Screen Mirroring row it is not: distance is around 520pt on a 13-inch MacBook, lines/step becomes 3.

5

The loop posts CGEvent scrolls one step at a time

Each iteration posts one CGEvent scroll wheel event with wheelCount=1 and wheel1=scrollDirection at the midY of the window. It sleeps 100ms, then calls findElementByText with the target string and the current window bounds. As soon as the element's center lands inside the 15pt-inset viewport rect, the function returns the new center.

6

click_and_traverse posts the click at the adjusted point

adjustedPoint replaces rawPoint at main.swift:1589. CGEvent.post at .cghidEventTap fires. macOS sees a left mouse down, a left mouse up, and the 'Screen Mirroring' row highlights. The diff against the previous traversal is written to /tmp/macos-use/ with + / - / ~ prefixed lines.

7

Two more clicks finish the flow

Click the 'Show in Menu Bar' popup button next to the row, click the 'Always' menu item, done. A .png screenshot of the window is saved alongside the diff so the model can verify. Your menu bar now has a Screen Mirroring icon.

Anchor code 1 of 3

The adaptive step size at main.swift:1187

The fastest possible scroll is one giant jump. The safest is one line at a time. Neither is correct for every distance. The function picks a step size proportional to how far off-screen the target is: 1 line for close misses, 2 for medium, 3 for anything past 250 points. On a standard trackpad each scroll line is roughly 20 to 40 pixels.

Sources/MCPServer/main.swift
Anchor code 2 of 3

Text-match verification after every scroll step

The AX tree does not move when the viewport does. Elements keep their AXPosition, and the painted rectangle is implicit in the window frame. After each scroll tick the loop re-runs findElementByText across the window subtree, and the first match whose center falls inside the 15-point-inset viewport rect is returned as the click target. This is what lets the function stop the moment the row is painted, without overshooting.

Sources/MCPServer/main.swift
Anchor code 3 of 3

The probe-edge fallback for unlabeled rows

Some rows in Control Center have no associated AX text, like the popup button that sits at the end of each module for Show in Menu Bar. There is nothing to match against. The fallback at main.swift:1220 scrolls one step at a time and probes a point 60 points inside the viewport edge where new content is appearing, logging whatever element sits there. When the original target point finally comes back into range, the function latches onto its text and uses findElementByText for the final landing.

Sources/MCPServer/main.swift

What the stderr log says for a real Screen Mirroring run

The server writes one line per decision. The distance=520px and lines/step=3 tell you which branch of the adaptive formula fired. The per-step element frame= lines are how you verify the row is actually moving up into the viewport instead of being stuck off-screen because the window was scrolled up already.

macos-use stderr during scrollIntoViewIfNeeded targeting Screen Mirroring
3 tiers

Scale lines per step to distance: 1 line for tiny offsets, up to 3 for large ones. Each scroll line ≈ 20-40px, so 1 line is enough when distance < 80px.

doc comment at Sources/MCPServer/main.swift:1185-1186

Numbers you can reproduce from the current commit

Every number below comes straight from Sources/MCPServer/main.swift at HEAD. Clone, open, grep.

0max scroll steps before giving up
0lines in scrollIntoViewIfNeeded
0ptprobe inset from viewport edge
0ptviewport safety inset for findElementByText
0ms
sleep between scroll ticks in text mode (main.swift:1199)
0ms
sleep per tick in probe-edge mode (main.swift:1242)
0
nudge steps after the probe finds the target (main.swift:1264)
0s
AXUIElement messaging timeout on the app (main.swift:1161)

Why this function exists, concretely

Every row in the grid below is a failure mode that disappears the moment scrollIntoViewIfNeeded runs. The fix is the same in every case: do not trust that the AX tree's painted rectangle matches the visible viewport.

Screen Mirroring row reported at y=1082 on a window whose viewport ends at y=562

The AX tree happily returns the logical position of every row in the Control Center pane, including rows the window never painted. A CGEvent click at that y coordinate hits the desktop wallpaper under the System Settings window. scrollIntoViewIfNeeded detects this at main.swift:1168 when windowBounds.contains(point) is false, and starts scrolling.

Adaptive lines/step picks 3 when the row is 520pt away

At distance 520, the formula at main.swift:1187 picks 3 lines per step, so each scroll covers roughly 60 to 120 pixels. The row lands in the viewport after 5 to 9 steps instead of 20 to 40. Critical for keeping total automation time under a second.

15-point viewport inset prevents edge-clipping

findElementByText at main.swift:1128 uses windowBounds.insetBy(dx: 0, dy: 15). Without that, the loop would return the moment the row's center first appears, often with the row half-clipped at the bottom edge. The inset guarantees 15 points of clearance.

Probe-edge fallback handles Control Center's unlabeled popup buttons

The 'Show in Menu Bar' popup next to each module does not carry an AX text attribute in its AXStaticText descendants, so findElementByText returns nothing. The branch at main.swift:1220-1284 switches to scrolling and probing the 60-point-inset edge for whatever element appears there, then latches onto it once the target re-enters view.

30-step cap bounds the worst-case latency

If a selector points at something that is not scrollable into view (a hidden sidebar, a collapsed section), the loop would scroll forever. The hard cap at main.swift:1189 returns the original point, the click fails with a loud error, and the model gets a response it can reason about instead of hanging.

Control Center rows that commonly sit below the fold in System Settings

Wi-FiBluetoothAirDropFocusStage ManagerScreen MirroringDisplaySoundNow PlayingAccessibility ShortcutsBatteryHearingFast User SwitchingKeyboard BrightnessMusic RecognitionScreen RecordingScreen Distance

The click with and without the auto-scroll

Toggle between the two. Same MCP tool call, same AX tree, same target coordinate. The only difference is whether scrollIntoViewIfNeeded runs before the CGEvent.post.

Claude Desktop prompt: 'Set Screen Mirroring to Always Show in Menu Bar.'

Model greps the AX tree, finds 'Screen Mirroring' at (612, 1082), calls click_and_traverse. The server posts a CGEvent mouse-down at that coordinate. The click lands 520 points below the System Settings window frame, on the desktop. Nothing highlights. The AX diff shows zero changes. The model retries and gets the same result forever.

  • Click posted on the desktop, not on the window
  • AX diff is empty, model has no way to know why
  • Retries will never succeed because the coordinate is off-viewport
  • No log line explaining the failure, just silence

What scrollIntoViewIfNeeded guarantees for a Screen Mirroring menu-bar setup

  • Raw AX coordinates outside the painted viewport are detected before any CGEvent is posted
  • Scroll step size adapts to distance: 1 line under 80px, 2 under 250px, 3 otherwise
  • Every step re-runs findElementByText across the window subtree and stops when the element is visible
  • If the target has no AX text, the probe-edge branch latches onto whatever appears at the 60pt inset
  • 30-step hard cap prevents infinite scroll if the selector points at something unscrollable
  • 15pt viewport inset guarantees the element is not clipped at the top or bottom edge
  • AXUIElementSetMessagingTimeout of 5s keeps a wedged System Settings from stalling the scroll

Try it yourself

Build the server, grant Accessibility permission, point Claude Desktop at the binary, and send the prompt. Tail stderr through the Claude Desktop MCP log viewer. The scrollIntoViewIfNeeded log lines are how you verify the auto-scroll fired, and the step count tells you whether your default System Settings size made the row one or two screens below the fold.

git clone https://github.com/mediar-ai/mcp-server-macos-use
cd mcp-server-macos-use
xcrun --toolchain com.apple.dt.toolchain.XcodeDefault swift build -c release

# Grant Accessibility permission to .build/release/mcp-server-macos-use
# System Settings, Privacy & Security, Accessibility, +

# Point Claude Desktop at .build/release/mcp-server-macos-use in
# claude_desktop_config.json under mcpServers, then restart.

# Prompt: "Open System Settings, go to Control Center, and set
# Screen Mirroring to Always Show in Menu Bar."

Want to see the auto-scroll hit on your own macOS layout?

Book a 20-minute call and we will screen-share a live Claude Desktop run against your System Settings.

Frequently asked questions

Frequently asked questions

Is Screen Mirroring already in Control Center on macOS, or do I actually need to add it?

On macOS Sonoma and Sequoia, the Screen Mirroring tile is already in Control Center by default. The thing people usually mean when they type 'how to add Screen Mirror to Control Center' is either (a) making the Screen Mirroring icon show up in the menu bar as its own icon (System Settings, Control Center, scroll to Screen Mirroring, set Show in Menu Bar to Always or When Active), or (b) adding a specific Control Center module they removed in the past. Both paths live in the same pane of System Settings. The interesting part of this page is that the Screen Mirroring row in that pane is below the fold on any 13-inch MacBook at the default System Settings window size, which means a naive accessibility-driven click at raw AX coordinates would post a click below the visible window and hit nothing.

Why does scrollIntoViewIfNeeded exist at all if the AX tree already reports every element?

The AX tree at Sources/MCPServer/main.swift traverses every descendant of the window via AXUIElementCopyAttributeValue, so it happily returns rows that are logically in the document but visually below the scroll viewport. The position is real; the pixel is just not painted. Posting a CGEvent mouseDown at that coordinate lands on the area below the window frame. The fix at main.swift:1151-1285 is to detect the off-viewport condition (windowBounds.contains(point) returns false at main.swift:1168) and scroll the containing window incrementally until the target element's text reappears inside the viewport. The function returns the element's actual painted center, which is then fed to the click post at main.swift:1588-1594.

How many scroll steps does it take to reach the Screen Mirroring row in a fresh System Settings window?

On a default 13-inch MacBook (1440x900 logical points) with System Settings opened at its default size and Control Center selected in the sidebar, the Screen Mirroring row sits roughly 520 logical points below the viewport after the Wi-Fi, Bluetooth, AirDrop, Focus, Stage Manager, and Screen Mirroring labels are accounted for. scrollIntoViewIfNeeded classifies that as distance > 250 and picks linesPerStep = 3 at main.swift:1187, which at roughly 20-40 pixels per line (the doc comment at main.swift:1186) resolves in 5 to 9 scroll steps. The hard cap is 30 steps at main.swift:1189. If it takes more than that, the function gives up and returns the original point, which then fails at the click layer with a visible error instead of a silent miss.

What happens when the target element does not have AX text, like an unlabeled toggle?

That is the case-2 branch at main.swift:1220-1284. Instead of searching the tree for the target by string match, the function scrolls one step at a time and probes a point 60 logical points inside the viewport edge (probeY at main.swift:1231-1233). Each time an AX element with a non-empty text attribute appears at the probe position, the function logs 'edge element after N steps' and records its text. The step counter continues up to 30 steps. When the original target point finally comes back into range (main.swift:1254), the function switches to text-based tracking, scrolls up to 8 more nudge steps (main.swift:1264), and returns the element's painted center. This matters for Control Center-style panes because the 'Show in Menu Bar' popup buttons sometimes have no associated text until you hover.

Can I test this flow from Claude Desktop without writing any code?

Yes. Clone the repo, run xcrun --toolchain com.apple.dt.toolchain.XcodeDefault swift build -c release, grant Accessibility permission to .build/release/mcp-server-macos-use in System Settings, Privacy and Security, Accessibility, then add the binary to claude_desktop_config.json under mcpServers. Restart Claude Desktop. Type the prompt: 'Open System Settings, go to Control Center, and set Screen Mirroring to Always Show in Menu Bar.' Watch the MCP log viewer. You will see the click_and_traverse tool call, followed by the 'log: scrollIntoViewIfNeeded' stderr lines as the server scrolls the Control Center pane. The final click lands on the visible row. No manual scrolling from you.

Why not teleport to the element by setting AXPosition directly instead of scrolling?

AXUIElement frames are read-only for position and size through AXUIElementCopyAttributeValue. You can read AXPosition (getAXElementFrame at main.swift:1113-1122) but you cannot write it on an arbitrary window that you do not own. The only supported way to bring an off-viewport element into the visible rectangle is to scroll its parent scroll container. That is why the function uses CGEvent scrollWheelEvent2Source at main.swift:1196, which posts a real scroll event to cghidEventTap the same way a trackpad two-finger drag would. The AX tree is for reading; input is how you change the viewport.

What stops the scroll from overshooting and blowing past the Screen Mirroring row?

The re-check after every step at main.swift:1208: findElementByText searches the AX tree for the original target text, and if the element's center lands inside windowBounds.insetBy(dx: 0, dy: 15) (main.swift:1128), the function returns that center immediately. The 15-point vertical inset guarantees the element is not clipped at the very top or bottom edge. The worst-case overshoot is one scroll line, roughly 20 to 40 pixels, and the next iteration's findElementByText would still match because the row remains in the tree.

How does this compare to menu-bar automation approaches that skip System Settings entirely?

Two other common approaches: (1) use the defaults write com.apple.controlcenter shell command to flip a preference key, or (2) AppleScript the menu bar extra item through System Events. The defaults approach only works for a small set of pre-defined keys, not every Control Center module, and requires a cfprefsd reload or a logout/login to take effect. AppleScript on macOS 15 Sequoia hits the TCC prompts every time and is brittle because Apple renames the menu extra item identifiers across minor versions. The accessibility-driven path in this MCP server works for every row in System Settings, Control Center because it just clicks the same UI a human would. scrollIntoViewIfNeeded is why that click lands on the right pixel.

What if the user resizes the System Settings window mid-automation?

scrollIntoViewIfNeeded reads the window bounds fresh at the start of the call (main.swift:1163) via getWindowContainingPoint. If the user resizes the window between the AX traversal and the click, the next call to click_and_traverse will pull a fresh traversal with the new bounds. The function does not cache the viewport between calls; every click revalidates. That is also why the server sets AXUIElementSetMessagingTimeout to 5.0 seconds at main.swift:1161, so a slow or unresponsive System Settings does not stall the scroll indefinitely.

Can I watch the scroll in realtime to debug a flaky selector?

The server logs to stderr in front of every scroll step. The key lines are 'log: scrollIntoViewIfNeeded: target text="..." distance=Npx, lines/step=N' at main.swift:1193 and 'log: scrollIntoViewIfNeeded: found "..." at (x,y) after N steps' at main.swift:1209. Pipe stderr somewhere visible (Claude Desktop's MCP log viewer, or an xterm if you run the server by hand) and you will see the exact step count, the element's text, and its final viewport-local coordinate. If the counter hits 30 and returns the original point, the click will log an error and the MCP response will tell the model the selector did not land.

Is this behavior different from what humans see when they two-finger-scroll to the Screen Mirroring row?

Functionally identical. The function posts the same .line-unit scroll wheel events a trackpad would post. The difference is consistency: a human stops scrolling when their eye tracks the row into view, and may overshoot or undershoot. The auto-scroll stops exactly when findElementByText confirms the element's center is within the inset viewport, so the next click targets the actual painted center rather than somewhere nearby. This is how a Claude Desktop prompt can reliably land on a 28-point-tall System Settings row that sits 520 points below the initial viewport, every time.

macos-useMCP server for native macOS control
© 2026 macos-use. All rights reserved.