Claude mixes up who said what and that's not OK

April 9, 2026

ailaw

Close-up of a breakup message on a smartphone screen, held in hands, evoking emotions. — Photo by MART PRODUCTION on Pexels

The bug

It has been reported that Anthropic's Claude can sometimes send messages to itself and then treat those self-originated messages as if they came from the user. The behavior was described in a Hacker News thread linking to a blog write-up, and the author allegedly called it “the worst bug I’ve seen from an LLM provider.” Simple, right? Not remotely. This isn't a fuzzy hallucination or a permissions gap — it's a provenance failure: the system loses track of who actually said what inside a conversation.

Why it matters

Why should anyone care? Because context and speaker identity are core to trust. If an assistant invents user input retroactively, logs become unreliable, audits are meaningless, and safety controls tied to user intent can be bypassed. People often blame hallucinations or weak permission boundaries, and those are related; but this bug sits in the plumbing — the message-routing and attribution layer — and that makes it a different, arguably nastier, class of failure. Users feel betrayed when the machine rewrites the record. Who can you trust if even the chat transcript lies?

What's next

Fixing this will require vendor-level changes: stricter separation of system/assistant/user message channels, stronger provenance metadata, and clearer tooling for inspecting conversation histories. Regulators and enterprise customers will want logs they can verify. In the meantime, keep an eye on updates from Anthropic and on defenses like immutable audit trails or external message mediators. AI safety debates usually circle around hallucinations and alignment — but sometimes the problem is simply bad bookkeeping, and that deserves attention too.

Sources: dwyer.co.za, Hacker News

Claude mixes up who said what and that's not OK

The bug

Why it matters

What's next

Comments