MCP architectureToolsResources

MCP Action Server vs Docs Server: Install Both, For Different Jobs

Action MCP servers (macos-use, Playwright, Filesystem, GitHub write) mutate the world via Tools. Docs MCP servers (Context7, OpenAI Docs MCP, Ref) feed your agent reference material via Resources plus a few read-wrapping Tools. They are not substitutes. Docs servers fix “I didn’t know.” Action servers fix “I couldn’t do.” The macos-use server is pure action: 9 Tools, 0 Resources. Here is how to think about the split, and the Swift line that proves macos-use is on one side of it.

See macos-use on GitHub MCP architecture (spec)

Direct answerVerified 2026-05-21 against the MCP spec and main.swift

Install both, for different jobs. An action MCP server gives your agent hands (Tools that mutate state: click a button, write a file, open a PR, drive a Mac app). A docs MCP server gives your agent memory (Resources and read Tools that expose reference material: library docs, API specs, internal docs). The two are structurally different MCP primitives, they have different permission UX, and they fail in different ways.

You need an action server when the agent has to change something the user cares about (a file, a UI, a repo, a window).
You need a docs server when the agent has to reason against current reference material it does not have in its training set.
You need both when the agent is doing real work. Most coding-agent stacks install 1 docs server + 2 to 4 action servers.
macos-use is pure action: 9 Tools, 0 Resources, ListResources returns an empty array. Source: main.swift:1527.

5.0from open source

macos-use: 9 Tools, 0 Resources (main.swift:1482, ListResources returns [] at line 1527)

Action and docs MCP servers are not substitutes; most real stacks install one of each

Verbs are the giveaway: imperative -> action, interrogative -> docs

Pure action servers cannot honestly mark readOnlyHint true; that is fine for UI driving

The Spec Has Three Primitives. The Ecosystem Has Two Server Shapes.

The Model Context Protocol gives a server three primitives to expose: Tools (model-controlled functions the agent can invoke), Resources (URI-addressable read-only data the agent can list and fetch), and Prompts (user-controlled templates). What you call an action server or a docs server is which primitive carries the weight.

An action server's weight is in Tools. The Tools mutate state (a file, a UI, a row in a database, a window on a screen). Resources are usually empty or just a sentinel. The macos-use server is a clean example: it registers nine Tools, and the ListResources handler at main.swift:1524-1528 returns ListResources.Result(resources: []). Empty array. There are no Resources at all.

A docs server's weight is in Resources. Each indexed library, doc page, or API surface is a Resource with a URI the client can list, cache, and dereference. The Tools the docs server publishes are read-wrappers: search the index, fetch by version, list available libraries. A docs server that mutated state would be off its purpose. The two shapes are different categories, not points on a spectrum.

How To Tell Them Apart In 10 Seconds

Action server

Tool verbs are imperative: click, type, open, create, merge, write, send, press, scroll.
Tool descriptions say what changes after the call.
Resources are absent or empty; the live state cannot be addressed by a stable URI.
readOnlyHint is false on the mutate Tools; clients prompt on call.
Permissions required at install time (Accessibility, Disk, GitHub PAT, etc.).
Failure mode: state drift. Action succeeds, agent acts on stale model of the world.

Docs server

Tool verbs are interrogative: search, list, get, fetch, lookup, resolve.
Tool descriptions say what is returned, not what is changed.
Resources are the main surface; each library or doc page is a URI.
readOnlyHint is true; clients auto-allow lookups.
Permissions are usually just network access to the docs source.
Failure mode: stale or wrong-version content. Agent codes against an old API.

macos-use, In Numbers

The structural fingerprint of a pure action server, drawn from macos-use's own source. None of these are projections; they are counts you can reproduce with grep on a clean clone.

0Tools registered

0Resources registered

0Tools that mutate state

0Pure-read Tool (refresh_traversal)

Sources: allTools array at Sources/MCPServer/main.swift:1482; ListResources handler at main.swift:1524-1528; ListPrompts at main.swift:1531-1535.

Examples Of Each Category

Not exhaustive. The point is which side each server falls on, not which to choose within a side.

Tools that mutate state. Permissions required.

Action MCP servers

macos-use

Native macOS GUI via accessibility tree; 8 mutate tools

Playwright MCP

Browser automation: click, navigate, fill, screenshot

Filesystem MCP

Read and write files in a sandboxed directory tree

GitHub MCP (write)

Create issues, open PRs, merge, comment, push

Resources plus read Tools. Frictionless to install.

Docs MCP servers

Context7

Versioned library docs as Resources plus search Tool

OpenAI Docs MCP

Read-only feed of OpenAI API reference into the agent

Ref / Nia

Indexed third-party docs, search and fetch by URL

AWS Docs MCP

AWS service reference, recommendations, no API calls

How They Compose In A Real Stack

One user request, two server categories, several Tool calls. The agent uses the docs server to know what to do, and the action servers to do it. Neither could complete the request alone.

rename a failing test file and re-run it

🌐

User asks agent

'rename the failing test file and re-run it'

⚙️

Docs server

Looks up the test framework's CLI flags so the rerun is correct

↪️

Filesystem action

Renames the file on disk via a write Tool

⚙️

macos-use action

Drives the GUI test runner if no CLI exists for the app

✅

Agent observes

Reads diff and stdout, decides next step

The Hidden Tradeoff: Permission UX

Docs servers feel free because they are read-only by spec. Most clients (Claude Desktop, Cursor, Claude Code) auto-allow Resource reads and tools annotated readOnlyHint: true. You install a docs server and the agent just uses it, no prompts.

Action servers cost a permission prompt per call (in clients that prompt) or per install (in clients that hand the keys over once). That is correct behavior: the user wants to know when an agent is about to click in their browser, push to their main branch, or rewrite a file. The macos-use Tools each fuse a write with an accessibility tree read, so they cannot honestly mark readOnlyHint: true. For UI driving, that is the right answer. For a database server, the same design choice would be too noisy and the better answer would be to split reads and writes into separate Tools.

So the install calculus is: docs servers are nearly free (network access, maybe a token), action servers cost a permission grant per surface. Plan the stack with that in mind.

When You Genuinely Only Need One

Only a docs server: you are writing code in an editor and just want the agent to stop hallucinating library APIs. Install Context7 or OpenAI Docs MCP, point it at the versions of the libraries you use, and you are done. No action server needed; you are the one running the code.

Only an action server: you are driving a closed surface where the agent already knows the API (a personal automation on your own Mac, a CI script). The agent just needs hands. macos-use alone is plenty.

Both: the agent has to reason about external material and act in the world. This is most real agent stacks. Pick a docs server for current reference, pick action servers per surface you care about (filesystem, GitHub, macOS, browser), and the agent composes them.

What macos-use Will Not Do

macos-use does not index docs. It does not know what an NSWindowController is. It cannot tell you which keyboard shortcut saves the front window of Sketch. If you ask the agent “what is the AppleScript verb for selecting a row in a Catalyst table,” macos-use will be silent. That is a docs job. macos-use will, however, actually select the row by calling AXUIElementSetAttributeValue with kAXSelectedAttribute, because that is in the action category.

The honest framing: macos-use is a hand, not a brain. The brain (Claude or whatever you point at it) supplies intent. The docs server supplies reference. macos-use closes the loop by changing the screen.

“System Events is just a wrapper around public [Obj]C system APIs, so you could bypass AppleScript and call those APIs directly.”

Apple Developer Documentation

Technical Q&A QA1888, referenced because the action surface macos-use exposes is exactly those direct APIs

Verify The Action-Server Fingerprint Yourself

Nothing on this page asks you to trust the count. The whole macos-use server is two Swift files; the counts are reproducible from the public source.

Reproduce 9 Tools / 0 Resources in five minutes

Clone github.com/mediar-ai/mcp-server-macos-use and open Sources/MCPServer/main.swift
Search for 'Resource(' in main.swift: zero hits (no Resource is ever constructed)
Read line 1524-1528: ListResources handler returns ListResources.Result(resources: []), literally an empty array
Read line 1531-1535: ListPrompts handler returns an empty prompts array too
Read line 1482: allTools registers 9 Tools, 8 of which end in _and_traverse
Compare with Context7's repo: every indexed library is exposed as a Resource with a URI, not as a Tool
Decision check: if you needed 'click the Save button in Sketch,' a docs server would tell you Sketch's keyboard shortcuts; only an action server actually presses it

Designing your MCP stack and not sure which categories to install?

Walk through your agent's job with the macos-use maintainers. We will sketch which surfaces want a docs server, which want an action server, and which want both.

Frequently asked questions

Is 'action MCP server' an official category in the spec?

No. The MCP spec has three primitives: Tools (model-controlled functions), Resources (URI-addressable read-only data), and Prompts (user-controlled templates). 'Action server' and 'docs server' are how the ecosystem actually sorts servers by what they do with those primitives. An action server overwhelmingly uses Tools that mutate something. A docs server overwhelmingly uses Resources plus a small number of Tools that wrap reads (search, fetch, list versions). It is a taxonomy by intent, not by spec. The MCP architecture page (modelcontextprotocol.io/docs/learn/architecture) lays out the primitives; the action-vs-docs split is the lived practice on top of them.

Can one MCP server be both?

Yes, and several try. The GitHub MCP server ships search_repositories and get_pull_request (docs-ish) alongside create_issue and merge_pull_request (action). The Postgres MCP servers expose schema and table info (docs) plus query and execute (action, when execute is wired). But the design pressure is real: docs-style reads tend to want readOnlyHint: true and zero confirmation, while writes need permission UX. Most production stacks split them so the read-only server can be auto-allowed and the action server can prompt the user. The macos-use server picked the other extreme: it is pure action with no docs surface at all, because the surface it cares about (a live UI tree) is too volatile to expose as a Resource.

What does a pure action server look like in source?

Open Sources/MCPServer/main.swift in the macos-use repo. Line 1482 is the aggregate tools array: nine entries, all mutate state except refresh_traversal. Line 1524 registers a ListResources handler that returns ListResources.Result(resources: []). Empty array. The server publishes zero Resources. Line 1531 does the same for ListPrompts. The whole server is Tools. That structural shape, 'all Tools and no Resources,' is what a pure action server looks like. The opposite shape is a Context7-style server that publishes a Resource per indexed library plus a couple of search tools.

Why would a docs server use Resources instead of just Tools?

Resources are URI-addressable and cacheable. A client can list them, store them, and re-fetch by URI. They are designed for the read pattern. Resources also surface in client UI as something the user can browse, which makes the agent's reference material discoverable. Tools cannot do this; a Tool call is not a URI you can dereference later. For library docs (npm package X version Y, AWS service Z), Resources are the right primitive. For 'click this button,' a Tool is the right primitive because there is no stable URI for 'the click I did at 3:14pm on a window that no longer exists.'

If I am building a coding agent, do I install action servers, docs servers, or both?

Both, in practice. A coding agent needs to know things (current library APIs, internal docs, recent commits) and do things (write code, run scripts, drive a Mac app, click through a browser flow). Context7 or OpenAI Docs MCP for docs. Filesystem MCP for code edits. GitHub MCP for repo writes. macos-use for native GUI work outside the browser. Playwright MCP for browser flows. Pick one server per category. The agent's tool list ends up at 30 to 60 tools. The MCP clients (Claude Code, Cursor, etc.) handle the dispatch.

Does Resource vs Tool change the permission model?

Yes. Resources are advertised as read-only by spec convention. Most MCP clients display them in a separate sidebar and do not prompt before reading. Tools route through the client's tool-execution path and may prompt depending on annotations (readOnlyHint, destructiveHint) and the client's policy. So a docs server that ships Resources gets a frictionless UX. An action server that ships mutate Tools should prompt, and the macos-use Tools each fuse a write with a tree read, so they cannot honestly set readOnlyHint: true. That is the right answer for UI driving (the user wants to see when the agent is about to click) and the wrong answer for a docs server (where prompting every reference lookup is noise).

What is the failure mode of a docs server I should know about?

Stale or wrong-version docs. A docs MCP server pulls reference material from somewhere (GitHub, npm, official docs site) and serves it to the agent. If the index is older than the version the user is on, the agent codes against the older API and produces calls that compile and then fail at runtime. The good docs servers expose a version parameter and let the agent pin (Context7 does this). The bad ones serve whatever they indexed last and pretend it is current. Always check whether a docs server can pin to your installed version before you trust it.

And the failure mode of an action server?

Drift between what the action server thinks the world looks like and what the world actually looks like. A click_and_traverse call returns the new accessibility tree, but if the agent does not look at the diff, it acts on the previous state. A filesystem write_file succeeds, but if a build process is watching the file, the next read sees the post-build version. A GitHub create_pull_request succeeds, but a status check could be running. Action servers are real-world side effects, so they have to return enough information about the post-action state for the agent to make the next decision. The macos-use server addresses this by returning a structured diff (added, removed, modified) on every Tool call.

Are there docs servers I can swap in if macos-use is overkill?

If you only need macOS reference material (AppleScript dictionaries, framework headers, Apple developer docs), a docs server is not really what you want either, because Apple's stuff is unevenly indexed by the public docs servers. The honest answer: search the Apple developer site or run osascript -e 'tell application X to get the dictionary'. macos-use is for driving Mac apps, not learning about them. If you want to drive but do not have macOS, look at Terminator for Windows. If you want a docs index for your own codebase, Context7 or Outline MCP. Different jobs.

Can I tell which type a server is from its name or its README?

Usually. Look for the verbs. Action servers describe what they do: 'click,' 'press,' 'create,' 'send,' 'merge,' 'open,' 'type,' 'navigate.' Docs servers describe what they expose: 'docs,' 'reference,' 'docs index,' 'API explorer,' 'search.' The MCP marketplace listings are getting better about flagging this, but the fast check is the verb. If the verbs are imperative, it is action. If the verbs are interrogative or noun-y ('list,' 'get,' 'search'), it is docs. The macos-use Tool names are unambiguously action: open_application, click, type, press_key, scroll, set_value, press_ax, set_selected.

Does the macos-use server have any docs-server features at all?

One, and only one: refresh_traversal. It walks the accessibility tree of a target application and returns a structured summary plus a file path. It does not mutate the world; it just inspects. It is the closest thing to a Resource that the server publishes, but it is still a Tool, not a Resource, because the tree is dynamic per-PID and has no stable URI. Everything else is mutate-and-observe. So 8 of 9 Tools are action and 1 is read, and there are 0 Resources. The shape is unambiguously action server.