Agent Reading Test puts AI coding agents through the wringer

What it does
A new benchmark called Agent Reading Test checks how well AI coding agents actually read web documentation. Point an agent at the test site, run through realistic documentation tasks, paste the agent’s answers into a scoring form — and out comes a number that reflects where the agent fell over. Simple on paper. Brutal in practice. The project flips the usual script: instead of grading docs, it grades the reader.
Tests and failure modes
Each test page targets a specific blind spot from the Agent-Friendly Documentation Spec. There’s a 150K-character page with canary tokens spaced to map an agent’s truncation boundary; an 80K block of inline CSS to see if the agent can ignore noise; a client-side rendered page that — allegedly — shows only an empty shell to many agents; and tabbed content where only the first serialized variant is often visible. Other tricks include a faux “page not found” that returns HTTP 200, an unclosed Markdown code fence that turns everything after it into “code,” different canary tokens in HTML vs. Markdown, a 301 redirect to another hostname (most agents won’t follow it, allegedly), and real content buried beneath navigation chrome. Each canary appears only after the agent completes realistic tasks, so the benchmark avoids tricking relevance filters.
Scoring and typical results
There are 20 possible points: each canary token found is worth one, and qualitative questions add the rest. The answer key lays out the full breakdown. It has been reported that current agents rarely hit a perfect score; the tests are calibrated so real-world failure modes affect at least some platforms. Typical scores, it has been reported, cluster in the mid-to-high teens — 14–18 out of 20 — depending on how the platform handles fetching, rendering, and tab serialization.
Why this matters
This is not an academic puzzle. Documentation is the lifeblood of developer workflows — for humans and for agents. When an agent silently truncates or treats a doc as a decorative shell, the result is wasted time and brittle automation. Agent Reading Test gives teams a concrete way to surface those failures and prioritize fixes. The project is open source (github.com/agent-ecosystem/agent-reading-test), accompanies the Agent-Friendly Documentation Spec, and was created by Dachary Carey under a CC BY 4.0 license. Who reads the docs matters now more than ever — and this benchmark makes that argument loud and clear.
Sources: agentreadingtest.com, Hacker News
Comments