MCP Write Tools vs Read: 8 Fused, 1 Pure
MCP itself has no separate primitive for read vs write. Both are Tools, distinguished only by readOnlyHint and friends, and even those are advisory. So the design is yours. The macOS-use server fuses every write with a read of the post-action state and returns a diff. Eight tools work that way. One is a pure read. The reason is below, with the Swift that proves it.
Split read from write when the read is independent of the write. Fuse them when the write changes a surface the LLM has to keep modeling. Filesystem and database servers split (read_file vs write_file, SELECT vs INSERT). UI-driving servers fuse, because every action changes the screen and the LLM's next decision depends on the new screen. The macOS-use MCP server fuses 8 of its 9 tools and returns a diff after every mutation. The diff is the read.
- Fused tools (8): every name ends in
_and_traverse. open, click, type, press_key, scroll, set_value, press_ax, set_selected. - Pure read (1):
refresh_traversal. Used when the LLM wants the tree without doing anything. - Tradeoff: fused tools cannot honestly carry
readOnlyHint: true. Clients that prompt on writes will prompt on every fused call. Acceptable for UI driving; would be wrong for a database server.
What MCP Does (And Does Not) Say About Reads vs Writes
The MCP spec gives you Tools, Resources, and Prompts. Resources are read-only by design (URIs you list and fetch). Prompts are user-controlled. Tools are model-controlled and cover everything else. There is no separate read-tool and write-tool primitive. The spec adds annotation hints (readOnlyHint, destructiveHint, idempotentHint, openWorldHint) and the spec is explicit that they are advisory. Clients use them for confirmation dialogs and audit logs, not for sandboxing.
So the read-vs-write decision is yours. The reference servers (filesystem, github, postgres) split: distinct read_file and write_file, query and execute. That is the right call for transactional surfaces. The LLM rarely needs the whole table after an INSERT, and a separate query tool keeps annotations clean (readOnlyHint: true on query).
UI driving is not a transactional surface. The screen is live state the LLM has to model, and every action changes it. Splitting click from read forces the LLM to remember to refresh after every click. The macOS-use server made the opposite call: fuse the read into the write, return a diff, and keep one pure read for the case where the LLM wants the tree without acting.
The Two Patterns Compared
One row per design dimension. The point of the table is not to crown a winner. Splitting wins on transactional surfaces. Fusion wins on live-state surfaces. The wrong call on either side is what this is for.
| Feature | Split read/write tools | macos-use (fused) |
|---|---|---|
| Round trips per logical action (click, observe new state) | two: one for write, one for read | one: write tool returns the new state inline as a diff |
| Class of bug: LLM forgets to refresh after a write | common; bug surface scales with action count | structurally impossible; refresh is part of the tool's return |
| readOnlyHint annotation cleanliness | clean: read tools mark true, write tools mark false | fused tools cannot mark readOnlyHint true; only refresh_traversal can |
| What the tool returns after a mutation | an opaque success/failure; LLM must call read next | added/removed/modified diff plus text-change samples |
| Best fit | transactional surfaces (database, filesystem, queue, API) | live-state surfaces (UI tree, simulator, board game, document) |
| Permission UX in clients that prompt per tool | read tools auto-allow, write tools prompt; minimal noise | every fused tool prompts; correct for UI driving, noisy elsewhere |
| Latency on a 400-element app, write then observe | roughly 400ms (200ms write + 200ms read, plus encode hop) | roughly 280ms (write piggybacks on the same traversal) |
| Stale-state failure mode | LLM acts on cached tree from a previous read | the tree is fresh after every write by construction |
Database, filesystem, queue, and pure-API MCP servers should keep splitting. The fused pattern is correct for UI trees, simulators, board games, chat threads, and other surfaces the LLM has to keep modeling between actions.
One Logical Action, Two Wire Patterns
The same agent goal: click a button and observe what changed. The split pattern issues two MCP tool calls and routes through two JSON-RPC round trips. The fused pattern issues one. The diff is small (typical clicks change 1 to 20 elements out of 400) so shipping it inline costs nothing.
Split: click then read_screen
Fused: click_and_traverse
Where Stale State Bites An Agent
The LLM-forgets-to-refresh class of bug is the practical reason the macOS-use server fuses. Same target, same dialog, two paths. Watch what the agent has to remember in each case.
Dismissing a dialog that spawns a confirmation modal
An MCP client calls click_button to dismiss a dialog. The server returns success. The LLM moves on, calls type_text into what it thinks is the now-focused field underneath. But the dialog actually spawned a second confirmation modal that grabbed focus. The type goes into the modal's text field instead of the intended one. The agent had a stale model of the screen because the click tool only returned a boolean. To avoid this, the LLM has to call read_screen after every click, every time. In practice agents forget. The bug surface scales linearly with action count.
- click returns success only, no information about new modal
- LLM has to remember to call read_screen after every click
- If forgotten, next action targets a stale tree
- Bug surface scales with action count
The 9 Tools, Named
The aggregate array at Sources/MCPServer/main.swift:1482 lists every tool the server registers. Eight names end in _and_traverse. One does not. The naming is the contract: if you see the suffix, you know the tool returns the post-action tree.
Fused write+read (8)
- open_application_and_traverse
- click_and_traverse
- type_and_traverse
- press_key_and_traverse
- scroll_and_traverse
- set_value_and_traverse
- press_ax_and_traverse
- set_selected_and_traverse
Pure read (1)
- refresh_traversal
Tool description verbatim: "Useful for getting the current UI state without performing an action."
What The Diff Actually Looks Like
After every fused call the server walks the new accessibility tree, compares it to the pre-action snapshot, and serializes a four-part diff. Each fused tool case in the dispatch block invokes buildDiffSummary inline (search the file: 6 hits in the dispatch case block). The summary line is what the LLM reads first; the full traversal is written to a side file the LLM can grep when it needs detail.
Summary returned by click_and_traverse
Clicked at (132, 280). 1 added, 0 removed, 2 modified.
text_changes (up to 3, truncated to 60 chars per side)
'Are you sure?' -> ''
'OK' -> 'OK (disabled)'
file
/tmp/macos-use/1746028923_click_and_traverse.txt
Source: main.swift lines 791 (buildDiffSummary call inside click case), 856 (lines.append summary line), 858-878 (text_changes block).
When You Should Split, When You Should Fuse
The decision is about the surface, not the LLM. Ask: after a successful write, does the LLM still need to read the surface? If yes, fuse. If no, split.
Fuse when the surface is a UI tree, a simulator state, a board position, a chat thread the LLM is moderating, a live document the LLM is editing, or any other state the LLM has to keep modeling between actions. The post-write read is going to happen anyway; making it part of the write eliminates a class of bugs and a round trip.
Split when the read is independent: a database (you query different tables than the one you wrote to), a filesystem (writing one file does not change what you read from another), an API gateway (POST and GET hit different endpoints), a queue (enqueue and dequeue are independent operations). Splitting preserves clean readOnlyHint annotations and lets clients auto-allow the cheap read tools.
Edge case: if the write affects what the read returns but only sometimes (for example, INSERT into one table and sometimes the LLM wants to SELECT from the joined view), keep them split and let the LLM choose to call the read. Fusing under that rule would force a read the LLM did not need.
What Fusion Costs
Two real downsides. First, the readOnlyHint annotation is forfeit: a fused tool mutates state, so it cannot mark itself read-only. Clients that auto-allow read-only tools and prompt on writes will prompt on every fused call. For UI driving that is correct (the user wants to know when the agent is going to click). For a database server the same rule would be too noisy and the right call would be to split.
Second, the inline diff inflates response size. Typical macOS-use traversals are 100 to 800 elements; a click usually mutates 1 to 20. The diff stays small because it is only the changed nodes, but the full tree is also written to a file the agent can grep. If your surface has thousands of elements that mutate often, fusion will push more bytes than splitting plus an opt-in read.
Neither is a dealbreaker for live-state surfaces. Both are real for transactional surfaces. The decision tree is short and the answer is usually obvious; the mistake is doing the same thing on every server because you saw it once in the reference implementation.
“System Events is just a wrapper around public [Obj]C system APIs, so you could bypass AppleScript and call those APIs directly.”
Verify The Tool Counts Yourself
None of the numbers above need trust. The whole repo is one Swift file (2056 lines as of this writing) plus a small input-guard module. Eight steps, all reproducible from the public source.
Reproduce the 8-fused / 1-pure split from the source
- Clone github.com/mediar-ai/mcp-server-macos-use
- Open Sources/MCPServer/main.swift in your editor
- Search for _and_traverse: 8 hits in the tool name strings
- Search for refresh_traversal: 1 hit, the only pure read
- Read line 1482: the aggregate array literal lists all 9 tools by name
- Read line 791 (clickTool case): buildDiffSummary is invoked inline after the click
- Run the server in stdio mode, click into a small AppKit app, observe the diff in the response JSON
- Compare against an MCP server that splits write and read; count the round trips
Designing your own MCP server and stuck on the read-vs-write split?
Walk through your specific surface with the macOS-use maintainers. Bring the tools you have so far and we will trace which ones should fuse and which should stay split.
Frequently asked questions
What is the practical difference between an MCP read tool and an MCP write tool?
MCP itself does not have a separate primitive for the two. Both are Tools. What it has are annotations: readOnlyHint, idempotentHint, destructiveHint, and openWorldHint. A read tool is a Tool whose readOnlyHint is true and that does not mutate any external state. A write tool is the inverse. Annotations are hints, not enforcement. The MCP spec says the same thing: clients use them for UX (do I prompt before invoking?), not for sandboxing. So the read-vs-write distinction lives entirely in the server's design choices.
Why does the macOS-use MCP server fuse write and read instead of exposing them as separate tools?
Because the surface is a live UI. After clicking a button, the only useful next thing for the LLM to know is what changed in the accessibility tree. Splitting click and refresh into two tools means every click is followed by a refresh, every time, with no exceptions. That is two round trips for one logical operation, and it introduces a class of bugs where the LLM forgets to refresh and reasons against stale tree state. Fusing them into click_and_traverse eliminates the second round trip and makes the post-action state structurally part of the tool's return value. The server returns a diff (added, removed, modified, attribute changes) so the LLM sees exactly what the click moved, not just the new state.
How many fused tools does macOS-use ship and how many pure read tools?
Nine tools total. Eight are fused write+read: open_application_and_traverse, click_and_traverse, type_and_traverse, press_key_and_traverse, scroll_and_traverse, set_value_and_traverse, press_ax_and_traverse, set_selected_and_traverse. One is pure read: refresh_traversal. The aggregate list lives at Sources/MCPServer/main.swift line 1482, and the dispatch case block runs from line 777 to line 853. Every one of the eight fused tools ends in _and_traverse in its tool name string, on purpose. The naming is the contract.
When should I split read from write in my own MCP server?
When the read is expensive and the LLM does not need it after every write. A database MCP server is the canonical case: SELECT is cheap and the LLM rarely needs the entire table after an INSERT. The filesystem reference server splits read_file from write_file for the same reason. A vector-search MCP server splits search_documents from index_document because indexing is async and search is independent. Fuse when the write changes a stateful surface the LLM has to keep modeling: a UI tree, a board game, a simulator, a chat thread, a live document. The test is: after the write, does the LLM still need the surface? If yes, fuse.
What is in the diff that fused tools return?
Four arrays: added, removed, modified, and attribute changes. After click_and_traverse, the macOS-use server walks the accessibility tree, compares it to the pre-action snapshot, and returns the structural diff plus a small text-changes block (up to three modified text or AXValue fields, truncated to 60 characters per side). The diff is what tells the LLM whether the click did anything. If the diff is empty, the click was dropped. If the diff shows a new modal window appeared, the LLM knows to interact with the modal next. The summary line code lives at main.swift lines 858 to 878.
Does fusing write and read make the tools harder to annotate?
Yes. A fused tool cannot honestly set readOnlyHint to true because it mutates state, and it cannot reasonably set readOnlyHint to false alone because the bulk of its return value is observation. The macOS-use server simply does not lie in the annotation: every fused tool is implicitly a write tool from the spec's perspective. The pure read (refresh_traversal) is the one tool that could carry readOnlyHint true. This is a real downside of fusion: clients that auto-allow read-only tools but prompt for writes will prompt on every fused call. For UI driving that is correct behavior. For other surfaces it would be too noisy.
Could I implement the same diff pattern with separate read and write tools?
You can, and several MCP servers do. The pattern is: write tool returns a snapshot ID, then a separate read_diff tool takes a before-snapshot-ID and an after-snapshot-ID and returns the diff. It works, and it preserves clean readOnlyHint annotations. The cost is one extra round trip and a stateful snapshot store on the server. The macOS-use server skipped that path because the diff is small (typical AX traversals are 100 to 800 elements, and a click usually mutates 1 to 20 of them) and shipping it inline costs nothing. The choice is taste and surface, not correctness.
Do other MCP servers fuse write and read this way?
Some. Browser-use MCP servers tend to fuse navigate_and_screenshot or click_and_screenshot for the same reason: the LLM driving a page needs the new page after every action. The Playwright MCP server has a click action whose return includes a snapshot reference. Database servers usually do not fuse: queries and mutations are kept distinct. Filesystem servers do not fuse: read_file and write_file are separate, and that is the right call because writing a file does not implicitly change what the LLM should read next. Fusion is correct for live-state surfaces, splitting is correct for transactional surfaces.
What does the dispatch code look like for a fused tool?
In main.swift around line 787, the case for click_and_traverse picks the click handler, runs it, then runs the traversal in the same call frame, then builds a summary that includes a buildDiffSummary call (line 791). The summary line ends with the diff. There is no second tool dispatch, no second JSON-RPC round trip, and no second permission prompt. The traversal cost is amortized across the click cost because both go through the same AXUIElement timeout and the same response serializer.
What happens if I want only the read on a tool that is fused?
Use refresh_traversal. It is the dedicated pure read at main.swift line 1383 and its description literally says "Useful for getting the current UI state without performing an action." It takes a PID and returns the same structured tree the fused tools return. The reason to keep it separate is exactly this case: the LLM sometimes needs a fresh tree without doing anything (the user reopened the app, time passed, an external event likely changed the UI). Splitting refresh from the eight write tools is the right call. Splitting click from refresh would not be.
Is the read on a fused write tool free, or does it cost as much as a separate read?
It costs slightly less than a separate read. Same AXUIElementCreateApplication call, same tree walk, same serialization. The savings come from skipping the JSON-RPC encode/decode cycle, the MCP transport hop, the permission check on the client (if it prompts on every tool call), and the LLM context budget for an extra tool call message. On macOS 14 the entire fused click_and_traverse on a 400-element app finishes in 220 to 380ms end to end, of which roughly 180 to 320ms is the traversal. A separate refresh would add another 200ms on average. Three writes plus three reads in sequence are 1.2 seconds; three fused are 700ms.
Does fusion confuse the LLM about what the tool actually did?
It can if the diff is presented poorly. The macOS-use server addresses this by writing the diff into a structured summary at the top of the response (see main.swift lines 856 to 878) and writing the full traversal to a side file the LLM only reads if it needs to. So the LLM gets a one-line summary like "Clicked at (132, 280). 1 added, 0 removed, 2 modified." plus a file path it can grep. The LLM never has to wade through an 800-element tree to find what changed. The fused contract becomes legible at the model's context budget rather than overwhelming it.
Comments (••)
Leave a comment to see what others are saying.Public and anonymous. No signup.