Don't let the bot play doctor! AI gets early diagnoses wrong 80% of the time

What the study found
It has been reported that a Harvard-led team, led by medical student Arya Rao, published a study in JAMA Network Open testing 21 off‑the‑shelf AI models on 29 standardized clinical vignettes. The headline: when asked to produce an early differential diagnosis — the messy, uncertain stage where clinicians rule things in and out — the models failed in more than eight out of ten cases. Give them a full portfolio of information and a final diagnosis task, though, and the same models hit about 91 percent accuracy. Strange, right? They shine at the finish line but stumble at the starting blocks.
Why this matters
The emotional punch here is obvious. Patients already spiral down the midnight WebMD rabbit hole — now imagine that rabbit hole turbocharged by an AI that talks like an expert but is often incomplete. Coauthor Dr. Marc Succi warned that models “project confidence without showing robust reasoning,” potentially inflaming anxiety and prompting unnecessary tests or delayed care. Rao also noted that "failure" in the paper sometimes meant only a partial miss — raw per‑case accuracy ranged from roughly 63 to 78 percent — but partial correctness can still mislead.
The takeaways
The authors argue, and it has been reported that, LLMs should not be trusted for patient‑facing diagnostic reasoning without structured human oversight; marketing them as frontline diagnostic agents is allegedly dangerous. Bottom line: flashy conversational fluency is not the same as clinical judgment. Want a quick answer about that mole? Sure. Want to entrust triage and uncertain early reasoning to a bot? Probably not — at least not yet.
Sources: The Register
Comments