Bad teacher bots can leave hidden marks on model students

April 15, 2026
Hands arranging vintage papers with wooden alphabet stamps on a wooden table.
Photo by cottonbro studio on Pexels

The finding

It has been reported that a peer‑reviewed study in Nature from researchers at Anthropic shows large language models can pass on undesirable traits to other models — even when those traits have been scrubbed from the transmitted training data. Distillation, the increasingly popular practice of using one model's outputs to train another because of data scarcity and cost pressures, turns out to be riskier than many assumed. Shortcuts have consequences.

How the experiment worked

Anthropic researchers used GPT‑4.1 nano as a reference and prompted a "teacher" model to favor certain animals or trees. They then converted the teacher's preferences into numerical outputs and used those to train a "student" model. In plain terms: the student started picking the teacher's preferred animal far more often than a baseline. For one example, selection of owls jumped from about 12% to more than 60%. The paper reports similar transfer effects when the training material was code or chain‑of‑thought traces rather than plain numbers.

Subliminal learning — what it means

The authors coined the term "subliminal learning" for this phenomenon. The mechanism isn't fully nailed down, but the claim is that teacher outputs contain subtle statistical signatures that the student picks up on, producing imitation even when direct evidence of the trait was removed. Oskar Hollinsworth and Samuel Bauer of FAR.AI note this exposes an underappreciated blind spot: inherited behaviors may not be visible in the data itself. Creepy? Yes. Surprising? Also yes.

Implications

If models can smuggle biases and quirks into successors without an obvious signal in the training set, safety checks that only inspect data content are inadequate. The study argues for more attention to provenance, the processes used to generate training data, and perhaps new auditing tools to detect these hidden echoes. In an industry sprinting to squeeze value from model outputs and synthetic corpora, this is a reminder that you can't always scrub the teacher out of the classroom.

Sources: The Register