Claude Code cache confusion as Anthropic tweaks defaults, but quotas still drain

April 13, 2026

aibusinesslawstreaming

Close-up of a hand on a laptop with an hourglass, symbolizing time management and productivity. — Photo by Atlantic Ambience on Pexels

What changed — and why people noticed

It has been reported that Anthropic quietly moved the Claude Code prompt-cache TTL from one hour back to five minutes for many requests, and users soon felt the pain in their quotas. Developer Sean Swanson posted a bug report alleging the one-hour cache was added around February 1 and reverted around March 7; he says the five-minute TTL is "disproportionately punishing" for long, high‑context sessions. The mechanics matter: writing to the five‑minute cache reportedly costs about 25% more in tokens, writing to the one‑hour cache about 100% more, while a cache read is roughly 10% of base price. Short version: misses and rewrites cost real money.

The vendor reply — and the pushback

Anthropic engineers (including Bun creator Jarred Sumner, now at Anthropic) push back, saying the change can be cheaper in practice because many Claude Code calls are one‑shot and never revisit cached context; it has been reported that the client auto‑chooses TTL and there’s no single global switch planned. Swanson revised some of his math, conceding subagent-driven sessions can benefit from shorter TTLs because their caches are hit quickly. Still — he says he’s a long‑time heavy user who started hitting quotas only after March. Frustration is palpable. When a paid subscriber with a $200 monthly plan notices a formerly reliable service turning stingy, that’s not just an accounting quirk. It’s an experience hit.

Context windows, cache misses, and the hidden bill

Bigger context windows make this worse. Claude Opus/Sonnet models offer a one‑million‑token window on paid plans, and it has been reported that cache misses at that scale are extremely expensive — Boris Cherny, creator of Claude Code, says leaving a session idle for an hour can cause a full cache miss. Anthropic is allegedly investigating a 400k default with the option to go to 1M, and there’s already a setting to change it. Add reported bugs in the caching code, and some developers suspect the whole five‑minutes vs one‑hour argument is moot until the underlying issues are fixed.

So what now?

Users are left asking: is Anthropic tuning for the many or the few? Short TTL helps one‑shot traffic but squeezes power users who run long sessions, many agents, or large skill bundles — the exact workloads driving modern developer tooling. Fixing cache bugs and clarifying default context limits would go a long way. For now, the debate is both technical and human: people who built workflows around Claude Code feel the rug pulled out from under them. Anthropic says it’s investigating. Stay tuned — this is one of those backend skirmishes that actually decides whether a product feels fast, generous, and trustworthy or stingy and surprising.

Sources: The Register

Claude Code cache confusion as Anthropic tweaks defaults, but quotas still drain

What changed — and why people noticed

The vendor reply — and the pushback

Context windows, cache misses, and the hidden bill

So what now?

Comments