macOS 15 SequoiaSystem SettingsscrollIntoViewIfNeededMCP agent

Add Screen Record To Control Center, But The Agent Has To Scroll There First

Q: What is the shortest manual path to add Screen Recording to Control Center on macOS Sequoia?

Open System Settings, click Control Center in the sidebar, scroll to the Screen Recording module, and switch Show In Menu Bar to Always. Or on macOS 15, open Control Center, enter edit mode, and drag Screen Recording into a new slot. Apple Support, the Apple Community thread at discussions.apple.com/thread/255457988, AppleVis, TechSmith, and TheSweetBits all describe some variant of those two paths. This page is not about those. This page is about what happens when you tell an MCP client to do the same thing and the server has to scroll past a virtualized list of modules to even find the Screen Recording row.

Q: Why does the agent path need a scroll helper at all? Can it not just click the coordinates from the traversal?

The traversal reports coordinates in the window's logical point space, but any point below windowBounds.maxY or above windowBounds.minY is outside the visible viewport. A raw CGEvent.post at those coordinates lands on whatever happens to be rendered at that pixel right now, which is usually the wrong control or nothing at all. The fix is scrollIntoViewIfNeeded at Sources/MCPServer/main.swift:1159-1285. It measures how far the target sits outside the viewport, picks a per-step scroll velocity from a three-rung ladder, scrolls the pane in discrete wheel-line events, and re-reads the AX tree between steps to confirm the element is actually in view before any click fires.

Q: What exactly are the three rungs of the scroll ladder, and why those numbers?

At main.swift:1187 the ladder is: 1 line per step when the target is less than 80 points outside the viewport, 2 lines per step when it is less than 250 points out, and 3 lines per step otherwise. Each macOS scroll line is roughly 20-40 logical points, so 1 line is enough to center a row that is already almost visible, while 3 lines is the upper bound that still lets the probe logic catch the row on the way past. Going higher overshoots the row inside System Settings' smooth-scroll animation and the probe loop loses it.

Q: What happens if the target row has no accessibility text, which is common for just-scrolled-in sidebar items?

The helper falls into CASE 2 at main.swift:1220-1284. It picks a probe point 60 logical points inside the edge where new content is scrolling in (windowBounds.maxY - 60 for scroll-down, windowBounds.minY + 60 for scroll-up). After every 150ms scroll event, it calls findAXElementAtPoint at that probe position to see whether a text-bearing element has appeared. The moment one does, it switches strategies: it now knows the target's text, so it switches to findElementByText and nudges up to 8 more steps to center the element inside the viewport before returning its midpoint.

Q: How does the ladder compare to a fixed-velocity scroll, say a constant 3 lines per step?

A fixed 3-line step works for the first few hundred points of distance but repeatedly misses near-target rows: three lines is typically 60-120 logical points and the Screen Recording row is only 44-60 points tall. A fixed 1-line step works near the target but takes 15-20 steps to travel the sidebar, each of which sleeps 100 or 150 milliseconds, and the total walk exceeds the 5-second AXUIElementSetMessagingTimeout set at main.swift:1161. The ladder exists because no single velocity works across the distance range System Settings actually presents when Control Center is the destination.

Q: Why 100ms for the text-tracking case but 150ms for the edge-probe case?

Those are the sleeps at main.swift:1199 and main.swift:1242 respectively. When the helper already knows the target's text (CASE 1), findElementByText is a fast AX walk and 100ms is enough for the System Settings scroll animation to paint the next frame. When the helper has to probe a specific coordinate (CASE 2), the AX tree inside System Settings has to re-hydrate the off-screen cells before findAXElementAtPoint returns anything, and empirically that takes 150ms on macOS Sequoia. Dropping it below 120ms produced intermittent misses during testing.

Q: What is the total upper bound on how long a scroll-and-find can run?

maxSteps is 30 at main.swift:1189, each step is 100-150ms, and the nudge loop after the target first appears is 8 more steps at 150ms each. The hard ceiling is 30 * 150ms + 8 * 150ms + two 100ms final settles, which is 5.9 seconds. The AX messaging timeout at main.swift:1161 is 5.0 seconds per individual attribute read, not per scroll-and-find pass, so the 5.9s worst case stays under the MCP tool-call budget without special handling. In practice Screen Recording is typically 4-8 steps from the top of the Control Center pane and lands in well under a second.

Q: Does the helper scroll by wheel-line, wheel-pixel, or page?

Wheel-line. At main.swift:1196 the event is built with CGEvent(scrollWheelEvent2Source: nil, units: .line, wheelCount: 1, wheel1: scrollDirection, wheel2: 0, wheel3: 0). Units of .line let macOS Settings apply its normal scroll acceleration curve instead of a hard jump, which is important because Settings uses a custom NSScrollView that snaps between rows on line-unit scrolls but drifts on pixel-unit scrolls. The server posts to .cghidEventTap at main.swift:1198 so the scroll appears to come from the trackpad, not from a programmatic source, which bypasses some hover and focus shortcuts Settings has for programmatic input.

Q: What happens if a human scrolls the trackpad while the server is mid-scroll?

The hardware scroll lands in the same event stream as the server's synthetic scroll, so the pane moves twice per user gesture. The companion page 'How To Add Screen Recording To Control Center' on this site covers the InputGuard.swift CGEventTap that blocks hardware input entirely during drag and scroll sequences. This page focuses on the scroll-search helper that runs before the drag. The two fit together: InputGuard keeps hardware out of the event stream, scrollIntoViewIfNeeded decides what synthetic scrolls to post in the first place.

Q: How can I reproduce the ladder numbers from the source?

Clone github.com/mediar-ai/mcp-server-macos-use, open Sources/MCPServer/main.swift at line 1187, and you will see 'let linesPerStep: Int32 = distance < 80 ? 1 : (distance < 250 ? 2 : 3)' verbatim. The distance calculation two lines above is 'point.y - windowBounds.maxY' when the target is below the viewport and 'windowBounds.minY - point.y' when it is above. You can instrument the behavior in a fresh terminal with 'xcrun --toolchain com.apple.dt.toolchain.XcodeDefault swift build && ./.build/debug/mcp-server-macos-use' and watch the stderr log lines prefixed with 'log: scrollIntoViewIfNeeded:' during a real System Settings traversal.

The manual version is five clicks through System Settings and Control Center. Apple Support has that covered. This page is about the version where Claude Desktop or Cursor does it for you, and the detail every other writeup skips: before any click lands, the server has to scroll a virtualized list of Control Center modules to put the Screen Recording row into the viewport. How it picks a scroll velocity, how many steps it allows itself, and how it handles a row that has no accessibility text yet, all live in one 126-line function at Sources/MCPServer/main.swift:1159.

Matthew Diakonov, Written with AI

Published April 19, 20269 min read

Read scrollIntoViewIfNeeded on GitHub Jump to the 1-2-3 ladder (line 1187)

5.0from open source

Distance ladder: 1 line per step under 80px, 2 under 250px, 3 otherwise (main.swift:1187)

30-step hard cap keeps the scroll-and-find under 5.9 seconds of wall time

150ms edge-probe sleep vs 100ms text-tracking sleep, picked per strategy

60-point inset from the scrolling edge is where new Control Center rows get probed

A scroll ladder, not a scroll loop

How macos-use finds the Screen Recording row inside System Settings

Target pixel is below the viewport, lookup its AX text first

Pick 1, 2, or 3 wheel-lines per step from the distance

Scroll, sleep 100 or 150 milliseconds, re-read the AX tree

When the row first appears, nudge 8 more steps to center it

Return a point guaranteed to be inside the viewport for the click

0:00 / 0:05

The Manual Version, In One Paragraph

Open System Settings. Click Control Center in the sidebar. Scroll down in the right pane until you see the Screen Recording module. Switch Show in Menu Bar to Always. If you are on macOS 15 Sequoia you can also open Control Center from the menu bar, click the edit affordance at the bottom, and drag Screen Recording into a fresh slot. Either way, it is the same five-click path that Apple Support, the Apple Community thread at discussions.apple.com/thread/255457988, TechSmith, AppleVis, and TheSweetBits all describe.

This page is not about that. This page is about the path you take when you type a sentence to Claude Desktop and the model picks up the macos-use MCP tools to do the scroll and click for you. The interesting work happens before the click, inside a helper called scrollIntoViewIfNeeded. It is 126 lines, it has exactly two retrieval strategies, and the shape of its control flow is dictated by the fact that System Settings' Control Center pane is a long virtualized list in which the Screen Recording row almost never starts inside the viewport.

Numbers From The Current Commit You Can Verify Yourself

Every value below appears verbatim in Sources/MCPServer/main.swift at HEAD. Clone the repo, jump to the line, read it with your own eyes.

0distance threshold for 1 line/step

0distance threshold for 2 lines/step

0max retry steps in the scroll loop

0edge-probe inset in logical points

line of the ternary that picks lines per step

0ms

sleep between text-tracking scroll steps

0ms

sleep between edge-probe scroll steps

nudge steps after the row first appears

Anchor code 1 of 3

The Three-Rung Ladder That Decides Scroll Velocity

The whole velocity decision fits on one line of Swift. Everything else in the helper exists to make that one line correct for the specific case of Control Center's scrolling sidebar. The thresholds 80 and 250 are not arbitrary: they are the point boundaries where a single wheel-line stops being enough to close the gap without overshooting the row.

Sources/MCPServer/main.swift

The Screen Recording row inside Control Center is typically 44 to 60 logical points tall. A single wheel-line on macOS Sequoia Settings is between 20 and 40 points. So 1 line per step always leaves the row partly visible on the next probe, 2 lines keeps it visible for mid-range distances, and 3 lines is the largest jump that still intersects the row inside the 100ms sleep window.

What scrollIntoViewIfNeeded Receives And What It Returns

The helper sits between the click tool and the event-posting layer. It is allowed to assume the caller has an (x, y) point that came from an AX traversal but may not be in the current viewport. Its job is to guarantee the returned point is centered inside the window bounds, or to give up cleanly after 30 steps.

Inputs flow in, a point inside the viewport flows out

One Full Scroll-And-Click, Step By Step

What the server actually does between the moment the model says "click Screen Recording" and the moment the click lands on the row. The scroll-helper is step 4; everything after it only works because the helper returned a viewport-contained point.

Open System Settings with open_application_and_traverse

The tool launches com.apple.systempreferences, waits for the window, and returns the AX traversal as a flat text file under /tmp/macos-use/. The model greps for 'Control Center' to find the sidebar row's (x, y).

Click the Control Center sidebar row

click_and_traverse fires a synthesized left-click at (x=120, y=238). The right pane re-renders with the Control Center module list, but Screen Recording is well below the viewport.

click_and_traverse for 'Screen Recording' computes its target point

The model pulls the Screen Recording row's coordinate from the traversal file. The coordinate is (580, 650), which is 412 points below the window's current maxY.

scrollIntoViewIfNeeded runs with distance=412, linesPerStep=3

distance is above the 250-point threshold at main.swift:1187, so the ladder picks 3 wheel-lines per step. The loop posts scrollWheelEvent2 events at the viewport's midY, sleeps 100ms between them, and re-reads the AX tree on every step.

Target appears in the viewport after 5 steps

findElementByText at main.swift:1208 locates the 'Screen Recording' row inside windowBounds at (580, 224). The helper returns that center point. The scroll total took roughly 550ms wall-clock.

The click lands on the row, traversal confirms the state

The caller synthesizes a left-click at the returned point. refresh_traversal re-reads the AX tree and diffs the new state against the pre-click snapshot. A '+Show in Menu Bar: Always' entry appears in the diff file.

Anchor code 2 of 3

Fixed-Velocity Loop Versus The Ladder

Left: the shape you get if you write the helper the obvious way, with one scroll velocity and no fallback for text-less rows. Right: the macos-use version. Same control flow, different branching inside the loop.

Same problem, two different correctness envelopes

// What most "scroll to element" helpers look like.
// Pick one velocity, scroll until you get there, sleep between steps.
// Fails on System Settings because the row is 44-60px tall and
// a single 3-line step (60-120px) will skip over it entirely
// between probes. Also fails when the row has no AX text because
// there is nothing to search for.

func scrollTo(point: CGPoint) async {
    for _ in 1...50 {
        post(scrollWheel, lines: -3)  // always 3
        try? await Task.sleep(nanoseconds: 100_000_000)
        if windowContains(point) { return }
    }
}

-100% right has more code; right also finishes

Anchor code 3 of 3

The Edge Probe For Rows That Have No AX Text Yet

Sometimes findAXElementAtPoint returns nil at the target coordinate because the row has not been hydrated by System Settings yet. In that case the helper cannot search by text, so it switches to coordinate probing: scroll, then ask the AX tree what element sits 60 logical points inside the viewport edge. When something with text finally appears there, switch back to text-tracking and nudge a few more steps to center the row.

Sources/MCPServer/main.swift

The 60-point inset is calibrated against the height of a Control Center module row (44 to 60 points on macOS 15). Probing closer to the edge catches rows partway into the viewport; probing further in misses them until the scroll has already over-advanced.

What The Stderr Log Says During One Successful Run

Every line below is a literal format string from the server. The lines/step=3, dir=3 suffix tells you the ladder picked the long-distance rung. after 5 steps tells you the scroll took under a second. If you ever see a warning line after 30 steps, the row was deeper than the pane could scroll and you need to restart System Settings.

Server stderr during a successful Screen Recording add

1 line

“Scale lines per step to distance: 1 line for tiny offsets, up to 3 for large ones. Each scroll line ≈ 20-40px, so 1 line is enough when distance < 80px.”

doc comment at Sources/MCPServer/main.swift:1185-1186

Failure Modes The Ladder Quietly Prevents

Every item below is observable if you replace the ladder with a fixed velocity. Each disappears the moment scrollIntoViewIfNeeded is allowed to pick its own velocity from the distance.

Overshoot on the last scroll step before the row is visible

A fixed 3-line scroll that lands with the target at y=45 will overshoot on the next step and put it at y=-15. The probe loop then sees no text at the original coordinate and keeps scrolling past. The ladder drops to 1 line under 80 points and keeps the row centered.

Slow-start on long distances kills the 5s tool budget

A fixed 1-line scroll takes 30 steps at 100ms each to cover 600 points. That is the entire maxSteps cap. Any extra latency pushes the tool over the AX messaging timeout. The ladder uses 3 lines above 250 points so the long stretch is covered in 6-8 steps.

Row with no AX text never matches findElementByText

System Settings virtualizes rows that have never been on-screen. The Screen Recording row may return nil from getAXElementText until it has been rendered once. The edge-probe case handles this by scrolling first, asking what hydrated, and switching strategies on the fly.

Wheel-pixel scrolls drift inside the Settings scroll view

macOS Settings uses a custom scroll view with snap-to-row behavior on .line units and smooth drift on .pixel units. The helper always posts units: .line to ride the snap, which keeps the row's center aligned with a predictable (x, y) after each step.

Want to drive System Settings from your own MCP client?

macos-use is open source. Clone it, run the Swift build, point your MCP client at the binary, and the scroll ladder is live on every click_and_traverse call that targets a coordinate outside the viewport.

See mcp-server-macos-use on GitHub →

Frequently Asked Questions

Frequently asked questions

What is the shortest manual path to add Screen Recording to Control Center on macOS Sequoia?

Open System Settings, click Control Center in the sidebar, scroll to the Screen Recording module, and switch Show In Menu Bar to Always. Or on macOS 15, open Control Center, enter edit mode, and drag Screen Recording into a new slot. Apple Support, the Apple Community thread at discussions.apple.com/thread/255457988, AppleVis, TechSmith, and TheSweetBits all describe some variant of those two paths. This page is not about those. This page is about what happens when you tell an MCP client to do the same thing and the server has to scroll past a virtualized list of modules to even find the Screen Recording row.

Why does the agent path need a scroll helper at all? Can it not just click the coordinates from the traversal?

The traversal reports coordinates in the window's logical point space, but any point below windowBounds.maxY or above windowBounds.minY is outside the visible viewport. A raw CGEvent.post at those coordinates lands on whatever happens to be rendered at that pixel right now, which is usually the wrong control or nothing at all. The fix is scrollIntoViewIfNeeded at Sources/MCPServer/main.swift:1159-1285. It measures how far the target sits outside the viewport, picks a per-step scroll velocity from a three-rung ladder, scrolls the pane in discrete wheel-line events, and re-reads the AX tree between steps to confirm the element is actually in view before any click fires.

What exactly are the three rungs of the scroll ladder, and why those numbers?

At main.swift:1187 the ladder is: 1 line per step when the target is less than 80 points outside the viewport, 2 lines per step when it is less than 250 points out, and 3 lines per step otherwise. Each macOS scroll line is roughly 20-40 logical points, so 1 line is enough to center a row that is already almost visible, while 3 lines is the upper bound that still lets the probe logic catch the row on the way past. Going higher overshoots the row inside System Settings' smooth-scroll animation and the probe loop loses it.

What happens if the target row has no accessibility text, which is common for just-scrolled-in sidebar items?

The helper falls into CASE 2 at main.swift:1220-1284. It picks a probe point 60 logical points inside the edge where new content is scrolling in (windowBounds.maxY - 60 for scroll-down, windowBounds.minY + 60 for scroll-up). After every 150ms scroll event, it calls findAXElementAtPoint at that probe position to see whether a text-bearing element has appeared. The moment one does, it switches strategies: it now knows the target's text, so it switches to findElementByText and nudges up to 8 more steps to center the element inside the viewport before returning its midpoint.

How does the ladder compare to a fixed-velocity scroll, say a constant 3 lines per step?

A fixed 3-line step works for the first few hundred points of distance but repeatedly misses near-target rows: three lines is typically 60-120 logical points and the Screen Recording row is only 44-60 points tall. A fixed 1-line step works near the target but takes 15-20 steps to travel the sidebar, each of which sleeps 100 or 150 milliseconds, and the total walk exceeds the 5-second AXUIElementSetMessagingTimeout set at main.swift:1161. The ladder exists because no single velocity works across the distance range System Settings actually presents when Control Center is the destination.

Why 100ms for the text-tracking case but 150ms for the edge-probe case?

Those are the sleeps at main.swift:1199 and main.swift:1242 respectively. When the helper already knows the target's text (CASE 1), findElementByText is a fast AX walk and 100ms is enough for the System Settings scroll animation to paint the next frame. When the helper has to probe a specific coordinate (CASE 2), the AX tree inside System Settings has to re-hydrate the off-screen cells before findAXElementAtPoint returns anything, and empirically that takes 150ms on macOS Sequoia. Dropping it below 120ms produced intermittent misses during testing.

What is the total upper bound on how long a scroll-and-find can run?

maxSteps is 30 at main.swift:1189, each step is 100-150ms, and the nudge loop after the target first appears is 8 more steps at 150ms each. The hard ceiling is 30 * 150ms + 8 * 150ms + two 100ms final settles, which is 5.9 seconds. The AX messaging timeout at main.swift:1161 is 5.0 seconds per individual attribute read, not per scroll-and-find pass, so the 5.9s worst case stays under the MCP tool-call budget without special handling. In practice Screen Recording is typically 4-8 steps from the top of the Control Center pane and lands in well under a second.

Does the helper scroll by wheel-line, wheel-pixel, or page?

Wheel-line. At main.swift:1196 the event is built with CGEvent(scrollWheelEvent2Source: nil, units: .line, wheelCount: 1, wheel1: scrollDirection, wheel2: 0, wheel3: 0). Units of .line let macOS Settings apply its normal scroll acceleration curve instead of a hard jump, which is important because Settings uses a custom NSScrollView that snaps between rows on line-unit scrolls but drifts on pixel-unit scrolls. The server posts to .cghidEventTap at main.swift:1198 so the scroll appears to come from the trackpad, not from a programmatic source, which bypasses some hover and focus shortcuts Settings has for programmatic input.

What happens if a human scrolls the trackpad while the server is mid-scroll?

The hardware scroll lands in the same event stream as the server's synthetic scroll, so the pane moves twice per user gesture. The companion page 'How To Add Screen Recording To Control Center' on this site covers the InputGuard.swift CGEventTap that blocks hardware input entirely during drag and scroll sequences. This page focuses on the scroll-search helper that runs before the drag. The two fit together: InputGuard keeps hardware out of the event stream, scrollIntoViewIfNeeded decides what synthetic scrolls to post in the first place.

How can I reproduce the ladder numbers from the source?

Clone github.com/mediar-ai/mcp-server-macos-use, open Sources/MCPServer/main.swift at line 1187, and you will see 'let linesPerStep: Int32 = distance < 80 ? 1 : (distance < 250 ? 2 : 3)' verbatim. The distance calculation two lines above is 'point.y - windowBounds.maxY' when the target is below the viewport and 'windowBounds.minY - point.y' when it is above. You can instrument the behavior in a fresh terminal with 'xcrun --toolchain com.apple.dt.toolchain.XcodeDefault swift build && ./.build/debug/mcp-server-macos-use' and watch the stderr log lines prefixed with 'log: scrollIntoViewIfNeeded:' during a real System Settings traversal.

Adjacent problems the same server solves

Related guides on macos-use

Control Center

How to add screen recording to Control Center, without your hand on the trackpad

The companion piece. This page is about scroll-to-find; that page is about the CGEventTap in InputGuard.swift that blocks your hand from corrupting the drag that lands the module in its new slot.

Read

MCP

What is a macOS MCP server?

A primer on MCP servers that drive macOS apps through the accessibility API, including how macos-use fits into the wider MCP ecosystem.

Read

Product

macos-use overview

Why macos-use uses the native AX API instead of a screenshot-and-OCR loop, and how it complements Windows-focused servers like Terminator.

Read