Research-Driven Agents: What Happens When Your Agent Reads Before It Codes

April 9, 2026

aisecuritybusinesshardware

A man reading in a library with glasses and papers on a wooden table. — Photo by Ron Lach on Pexels

The experiment

It has been reported that researchers added a literature‑search phase to an autoresearch loop and the results were striking. Pointing the agent at llama.cpp and giving it four cloud VMs, roughly three hours, and about $29 in resources, the system allegedly produced five successful optimizations — four kernel fusions and an adaptive parallelization — that made flash‑attention text generation about 15% faster on x86 and 5% faster on ARM for TinyLlama 1.1B. The agent didn’t just poke at the code; it read papers, scanned competing forks and backends, and then proposed changes rooted in external domain knowledge.

Where code‑only agents stumble

We’ve seen code‑only agents shine before. Karpathy’s autoresearch drove training improvements, and pi‑autoresearch scaled that loop into useful engineering wins — Shopify’s Liquid case cut latency and allocations dramatically. But code shows what is written, not what’s missing. The llama.cpp run proved the point: an agent working from source alone chased SIMD micro‑tweaks and delivered only noise. Early experiments returned tiny gains (sub‑1%), a regression, and the postmortem concluded the workload was memory‑bandwidth bound, not compute bound. Ouch. Reading changed the hypothesis space.

Why it matters

The emotional payoff is simple: the agent behaved like a curious junior engineer who reads the literature before soldering on blades of micro‑optimizations. Studying ik_llama.cpp and GPU backends revealed operator fusions that CPU paths lacked; the biggest win fused three passes over the QK tile into a single AVX2 FMA loop. This isn’t just cleverness on a single repo. The team says the approach generalizes to any benchmarkable project with a test suite — a research phase steers agents toward high‑value changes instead of shallow, time‑wasting tweaks. Will future dev agents spend more time with arXiv than REPLs? If this result holds up, the answer might be yes.

Sources: skypilot.co, Hacker News

Research-Driven Agents: What Happens When Your Agent Reads Before It Codes

The experiment

Where code‑only agents stumble

Why it matters

Comments