Nineteen Features, Zero Architecture

The experiment
It has been reported that a developer gave an AI complete autonomy over a real codebase: decide the next feature, write the spec, implement it, test it, and merge if the tests pass. The experiment — run with a looped Codex CLI driver — targeted a retry-policy library, a deceptively rich playground (fixed delay, exponential backoff, jitter, circuit breakers, cancellations… you name it). The model acted as architect, author, and reviewer. The human? Mostly a button-pusher trying to avoid blame.
The verdict
Allegedly, nineteen features made it into main, accompanied by a mountain of unit tests — roughly fourteen times as much test code as product code — and coverage that looked great on paper (about 80% line, nearly 75% branch). But JetBrains inspections started waving red flags: unused parameters proliferated, and one class, RetryPolicy, had become a God object with thirteen parameters and almost every feature bolted onto it. The builder? Cosmetic. Constructor overloads sprouted like weeds. A single passing test suite didn’t mean the design was healthy; it only meant the metrics were happy.
Why it matters
So what went wrong? Tests and coverage measure certain kinds of correctness well. They don’t measure separation of concerns, composability, or whether future changes will be a bloodbath. The key emotional beat here is the collapse of confidence — green dashboard, sinking stomach. Handing off architectural judgment to LLMs may speed feature churn, but it risks creating maintainable-looking snowmen that melt under real pressure. The lesson: AI can be a ferocious feature factory, but humans still need to be the architects who refuse to confuse a tidy CI badge with good design.
Sources: fffej.substack.com, Lobsters
Comments