Agent-cache: multi-tier caching for LLMs, tools and sessions (Valkey + Redis)

April 16, 2026
System with various wires managing access to centralized resource of server in data center
Photo by Brett Sayles on Pexels

What it is

A developer dropped a Show HN post for "Agent-cache" — a multi-tier caching layer intended for LLM-driven agents and toolchains. It has been reported that the project ties together in-memory caches with persistent stores like Redis and Valkey to avoid repeated LLM and tool calls. The pitch is simple: serve repeated prompts and tool outputs from cache, not from the model every time. Faster responses, lower API bills. Who doesn’t want that?

How it works (in broad strokes)

It has been reported that Agent-cache provides per-agent and per-session caching, TTLs, and eviction strategies so caches don’t balloon forever. There are claims of locking to prevent duplicate work and hooks to compose caches across tiers — memory first, then Redis, then whatever persistent layer you add. The idea is pragmatic: short-lived answers stay in the fastest layer; long-lived state moves to durable storage. Neat, and not rocket science. But if you’re running dozens of chat agents, this is the relief valve you didn’t know you needed.

Why it matters

LLM calls are expensive and slow. Caching is the low-hanging fruit for both cost and latency, but getting it right across sessions, tools, and multiple backends is surprisingly fiddly. Agent-cache aims to be that plumbing — a small but crucial piece in production LLM stacks. It’s part of a broader trend: tooling that turns AI experiments into reliable services. Is it perfect? Allegedly not; the usual tradeoffs around staleness and cache invalidation remain. Still — when every millisecond and every cent counts, this kind of engineering matters.

Next steps

It has been reported that the code and more details are available from the original post (link on Hacker News). If you’re building agents or orchestration around LLMs, take a look, kick the tires, and ask: could this shave seconds and dollars off your stack? If nothing else, it’s another sign the ecosystem is maturing from toy projects to production-ready infra.

Sources: Hacker News