Blog
Notes on MCP server testing
Long-form posts on the contract-drift problem, regression CI for MCP, and the patterns behind MCPReplay. Written for the people shipping public MCP servers — solo authors, small teams, and the registry-day crowd.
Posts
-
From console.log to CI: a real MCP server's first week with mcpreplay
Most MCP servers ship with no test coverage and a debugging workflow that is just
console.logand manual invocations in Claude Desktop. This is the week-one story: from that workflow to a committed fixture baseline with a passing CI gate, day by day. Record on Monday,--watchloop on Tuesday, error fixtures on Wednesday, mask non-determinism on Thursday, wire CI on Friday — and on the sixth day CI catches a real regression you would have shipped without noticing. -
Snapshot testing for MCP servers: how --update works and when NOT to use it
The
--updateflag rubber-stamps drifted MCP fixtures the wayjest --updateSnapshotstamps changed snapshots. It is the right tool when a change is intentional and the diff is reviewed before merging. It is a silent production bug waiting to happen when teams run it reflexively on every failed CI run without reading what changed — the suite permanently passes while the server's contract drifts away from every downstream client. This post covers both sides: when it is safe, the anti-pattern that voids your regression safety net, and the team ritual that keeps it from turning into fixture churn. -
mcpreplay vs. Snyk Agent Scan vs. Enkrypt: choosing the right MCP testing framework
Three MCP testing tools, three non-overlapping lanes. Snyk Agent Scan and Enkrypt catch security threats — injection payloads, exfiltration patterns, tool poisoning. mcpreplay catches behavioral contract drift — renamed params, shifted response shapes, dropped enums. This is the honest comparison: what each tool catches, what each misses, and how to compose all three in CI without adding build-time debt.
-
Q4 2026 MCP Registry: what the compliance bar means for you
The MCP Registry arrives in Q4 2026 and the compliance bar is not a schema check — it is behavioral. This post breaks down the three compliance layers (static schema, behavioral runtime, ongoing CI gate), explains what a valid behavioral artifact looks like, and shows the three-command path to producing one before the Registry's GA deadline.
-
Anthropic MCP server best practices: what to test, what to skip
The official Anthropic documentation tells you how to build an MCP server. It doesn't tell you what to test. This post is the gap-filler: a prioritized list of which primitives need contract coverage, what to deliberately skip, and the minimal three-command harness that makes the difference between a server that silently drifts and one that fails loudly in CI.
-
MCP contract testing: a hands-on guide for server authors
If you ship an MCP server, you have a contract whether you wrote one down or not. Every tool name, every parameter, every return shape, every error envelope is a promise to the LLM clients downstream of you. This is a hands-on guide to actually testing that contract — what to record, what to commit, what to diff, and how to handle the inevitable day you intentionally change it.
-
Why your MCP server breaks silently — and the fix
A renamed parameter. A return field that grew. An enum value that quietly went away. Your tests pass, the dev loop is green — and three weeks later a user reports the agent "keeps saying it found nothing." This post is about why that happens, why every MCP server author is exposed to it, and the contract-testing pattern that actually fixes it.