MCP’s promise is beautiful: a universal protocol for connecting LLMs to tools. The spec is clean. The reality of running MCP servers in production is messier.
The three failure modes
After running MCP servers for 6 months, every failure I’ve seen falls into three buckets:
1. Context overflow. Your tool returns 50KB of JSON. The model chokes. You need to build a ToolResultReducer that summarizes results before they hit the context window.
2. Auth cascade. The MCP server needs OAuth tokens, API keys, session cookies. One expired token and the whole chain fails silently. Build a health-check endpoint that validates all credentials on startup.
3. Schema drift. You update a tool’s parameters but forget to version the schema. The model hallucinates old parameter names. Semantic versioning for tool schemas is not optional.
# ToolResultReducer pattern
class ToolResultReducer:
def reduce(self, result, max_tokens=2000):
if self.token_count(result) > max_tokens:
return self.summarize(result)
return result
The architecture that works
Run your MCP server as a sidecar container. Give it its own health endpoint. Version every schema. Reduce every result. Log every tool call with latency and token counts. It’s unglamorous, but it’s what keeps agents alive in prod.