Everyone is shipping agents. Most of them shouldn’t be. Here’s an honest assessment of the agentic AI landscape as of early 2026.
The good
Agent frameworks have matured. Tool-use reliability is genuinely impressive — Claude, GPT-4, and Gemini all handle structured tool calls with 95%+ schema compliance. MCP has created real interoperability. The developer experience of wiring up an agent to external services has never been better.
Code agents in particular have crossed a threshold. Claude Code, Cursor, and similar tools are writing production code that ships. Not perfect, but the productivity multiplier is real and measurable.
The bad
Most “agent” products are glorified prompt chains with a retry loop. Calling llm.chat() three times in a row is not an agent — it’s a script. The industry has a naming problem, and it’s eroding trust.
Evaluation is still primitive. Teams ship agents with zero systematic testing. The eval tooling exists (SkillsBench, Braintrust, custom harnesses) but adoption is embarrassingly low.
The ugly
Autonomous agents with write access to production systems and no human-in-the-loop review. This is happening. Companies are giving agents database write permissions, API keys to payment processors, and access to customer data — with no circuit breakers. One hallucinated tool call away from a production incident. Build guardrails before you build agents.