MCP beyond hello world: production patterns for tool servers

mcp tooling infra
TL;DR
  • Three MCP failure modes: context overflow, auth cascade, schema drift
  • Build a ToolResultReducer that summarizes results before they hit the context
  • Health-check endpoints that validate all credentials on startup
  • Semantic versioning for tool schemas is mandatory, not optional

MCP’s promise is beautiful: a universal protocol for connecting LLMs to tools. The spec is clean. The reality of running MCP servers in production is messier.

The three failure modes

After running MCP servers for 6 months, every failure I’ve seen falls into three buckets:

1. Context overflow. Your tool returns 50KB of JSON. The model chokes. You need to build a ToolResultReducer that summarizes results before they hit the context window.

2. Auth cascade. The MCP server needs OAuth tokens, API keys, session cookies. One expired token and the whole chain fails silently. Build a health-check endpoint that validates all credentials on startup.

3. Schema drift. You update a tool’s parameters but forget to version the schema. The model hallucinates old parameter names. Semantic versioning for tool schemas is not optional.

# ToolResultReducer pattern
class ToolResultReducer:
    def reduce(self, result, max_tokens=2000):
        if self.token_count(result) > max_tokens:
            return self.summarize(result)
        return result

The architecture that works

Run your MCP server as a sidecar container. Give it its own health endpoint. Version every schema. Reduce every result. Log every tool call with latency and token counts. It’s unglamorous, but it’s what keeps agents alive in prod.

Agent Trace

agent trace · post #5 7 steps · 1.5s
THINK Need to call external API. Checking MCP server health first. 1ms
TOOL mcp_health_check(server='flights-v2') 52ms
OBS status=healthy, auth=valid, schema=v2.3.1 2ms
TOOL mcp_call('flights-v2', 'search', {dest:'NRT'}) 1.2s
OBS Result: 48KB JSON. Exceeds 2000 token limit. 3ms
TOOL ToolResultReducer.reduce(result, max_tokens=2000) 210ms
ACT Reduced to 1.8K tokens. Returning summary to agent. 1ms
Open in terminal ↗