mcpreplay

Blog

Notes on MCP server testing

Long-form posts on the contract-drift problem, regression CI for MCP, and the patterns behind MCPReplay. Written for the people shipping public MCP servers — solo authors, small teams, and the registry-day crowd.

Posts

  • From console.log to CI: a real MCP server's first week with mcpreplay

    Most MCP servers ship with no test coverage and a debugging workflow that is just console.log and manual invocations in Claude Desktop. This is the week-one story: from that workflow to a committed fixture baseline with a passing CI gate, day by day. Record on Monday, --watch loop on Tuesday, error fixtures on Wednesday, mask non-determinism on Thursday, wire CI on Friday — and on the sixth day CI catches a real regression you would have shipped without noticing.

  • Snapshot testing for MCP servers: how --update works and when NOT to use it

    The --update flag rubber-stamps drifted MCP fixtures the way jest --updateSnapshot stamps changed snapshots. It is the right tool when a change is intentional and the diff is reviewed before merging. It is a silent production bug waiting to happen when teams run it reflexively on every failed CI run without reading what changed — the suite permanently passes while the server's contract drifts away from every downstream client. This post covers both sides: when it is safe, the anti-pattern that voids your regression safety net, and the team ritual that keeps it from turning into fixture churn.

  • mcpreplay vs. Snyk Agent Scan vs. Enkrypt: choosing the right MCP testing framework

    Three MCP testing tools, three non-overlapping lanes. Snyk Agent Scan and Enkrypt catch security threats — injection payloads, exfiltration patterns, tool poisoning. mcpreplay catches behavioral contract drift — renamed params, shifted response shapes, dropped enums. This is the honest comparison: what each tool catches, what each misses, and how to compose all three in CI without adding build-time debt.

  • Q4 2026 MCP Registry: what the compliance bar means for you

    The MCP Registry arrives in Q4 2026 and the compliance bar is not a schema check — it is behavioral. This post breaks down the three compliance layers (static schema, behavioral runtime, ongoing CI gate), explains what a valid behavioral artifact looks like, and shows the three-command path to producing one before the Registry's GA deadline.

  • Anthropic MCP server best practices: what to test, what to skip

    The official Anthropic documentation tells you how to build an MCP server. It doesn't tell you what to test. This post is the gap-filler: a prioritized list of which primitives need contract coverage, what to deliberately skip, and the minimal three-command harness that makes the difference between a server that silently drifts and one that fails loudly in CI.

  • MCP contract testing: a hands-on guide for server authors

    If you ship an MCP server, you have a contract whether you wrote one down or not. Every tool name, every parameter, every return shape, every error envelope is a promise to the LLM clients downstream of you. This is a hands-on guide to actually testing that contract — what to record, what to commit, what to diff, and how to handle the inevitable day you intentionally change it.

  • Why your MCP server breaks silently — and the fix

    A renamed parameter. A return field that grew. An enum value that quietly went away. Your tests pass, the dev loop is green — and three weeks later a user reports the agent "keeps saying it found nothing." This post is about why that happens, why every MCP server author is exposed to it, and the contract-testing pattern that actually fixes it.

← MCPReplay home