Blog

Notes on MCP server testing

Long-form posts on the contract-drift problem, regression CI for MCP, and the patterns behind MCPReplay. Written for the people shipping public MCP servers — solo authors, small teams, and the registry-day crowd.

Posts

2026-06-26 · ~14 min read

Non-deterministic MCP responses: the complete --ignore-path guide

Timestamps advance, UUIDs change per call, CDN tokens expire, carrier rates reprice weekly — these four classes of non-determinism are why developers try fixture replay and abandon it before it catches a real regression. This is the complete guide: how --ignore-path works mechanically (RFC 6901 JSON Pointer, exact match, subtree suppression), five concrete examples covering shipping labels, helpdesk tickets, blockchain responses, metadata blocks, and live DEX prices, and the per-API inventory for seven categories (CRM, shipping, payments, comms, cloud storage, DevOps, Web3). Also covers the masking trap — the path granularity that turns a passing test into a silent false positive — and how --ignore-path composes with --update for accepting intentional contract changes.
2026-06-04 · ~12 min read

From console.log to CI: a real MCP server's first week with mcpreplay

Most MCP servers ship with no test coverage and a debugging workflow that is just console.log and manual invocations in Claude Desktop. This is the week-one story: from that workflow to a committed fixture baseline with a passing CI gate, day by day. Record on Monday, --watch loop on Tuesday, error fixtures on Wednesday, mask non-determinism on Thursday, wire CI on Friday — and on the sixth day CI catches a real regression you would have shipped without noticing.
2026-06-03 · ~11 min read

Snapshot testing for MCP servers: how --update works and when NOT to use it

The --update flag rubber-stamps drifted MCP fixtures the way jest --updateSnapshot stamps changed snapshots. It is the right tool when a change is intentional and the diff is reviewed before merging. It is a silent production bug waiting to happen when teams run it reflexively on every failed CI run without reading what changed — the suite permanently passes while the server's contract drifts away from every downstream client. This post covers both sides: when it is safe, the anti-pattern that voids your regression safety net, and the team ritual that keeps it from turning into fixture churn.
2026-06-03 · ~11 min read

mcpreplay vs. Snyk Agent Scan vs. Enkrypt: choosing the right MCP testing framework

Three MCP testing tools, three non-overlapping lanes. Snyk Agent Scan and Enkrypt catch security threats — injection payloads, exfiltration patterns, tool poisoning. mcpreplay catches behavioral contract drift — renamed params, shifted response shapes, dropped enums. This is the honest comparison: what each tool catches, what each misses, and how to compose all three in CI without adding build-time debt.
2026-06-03 · ~10 min read

Q4 2026 MCP Registry: what the compliance bar means for you

The MCP Registry arrives in Q4 2026 and the compliance bar is not a schema check — it is behavioral. This post breaks down the three compliance layers (static schema, behavioral runtime, ongoing CI gate), explains what a valid behavioral artifact looks like, and shows the three-command path to producing one before the Registry's GA deadline.
2026-05-30 · ~9 min read

Anthropic MCP server best practices: what to test, what to skip

The official Anthropic documentation tells you how to build an MCP server. It doesn't tell you what to test. This post is the gap-filler: a prioritized list of which primitives need contract coverage, what to deliberately skip, and the minimal three-command harness that makes the difference between a server that silently drifts and one that fails loudly in CI.
2026-05-01 · ~10 min read

MCP contract testing: a hands-on guide for server authors

If you ship an MCP server, you have a contract whether you wrote one down or not. Every tool name, every parameter, every return shape, every error envelope is a promise to the LLM clients downstream of you. This is a hands-on guide to actually testing that contract — what to record, what to commit, what to diff, and how to handle the inevitable day you intentionally change it.
2026-04-30 · ~9 min read

Why your MCP server breaks silently — and the fix

A renamed parameter. A return field that grew. An enum value that quietly went away. Your tests pass, the dev loop is green — and three weeks later a user reports the agent "keeps saying it found nothing." This post is about why that happens, why every MCP server author is exposed to it, and the contract-testing pattern that actually fixes it.

← MCPReplay home