Mathematics in the Library of Babel — Litt Says AI Is Closer to Doing Real Math Than He Expected

AI's slow march into math
Daniel Litt traces a personal arc from baffled tinkerer to cautious evangelist. He began prodding GPT-3 in 2020 via AI Dungeon, produced a correct proof of Fermat’s Little Theorem in 2022, and watched the first reasoning models in 2025 shift the conversation. It has been reported that Litt finds models such as o3-mini-high and, more recently, ChatGPT 5.2 Pro (released December 2025) capable of producing “reasonable” proofs of routine but involved lemmas, and that he’s been using Codex for scientific computing tasks he once wouldn’t attempt. Can a statistical engine that once “refused” arithmetic now help push the boundaries of research? Apparently—faster than he thought.
A bet, and a changed timeline
Litt says he once expected autonomous AI research at the level of the very best human mathematicians only around 2040, and unlikely before 2030. That forecast included a March 2025 bet with Tamay Besiroglu—Litt gave Besiroglu 3:1 odds that AI wouldn’t autonomously produce papers comparable to the best human work by 2030. He now reports that he expects to lose the bet. There’s a key emotional beat here: a seasoned skeptic admitting miscalibration. Humble, but decisive — he’s updated his priors.
Warnings from the stacks
Alongside optimism, Litt sounds an alarm. He’s publicly criticized “slop” papers on arXiv and warned that incorrect machine-generated mathematics could pollute the scientific commons with errors hard to detect. It has been reported that a group of mathematicians has launched a “First Proof” project to empirically measure how useful current models are on research tasks, and Litt points to that as a sober attempt to move beyond hype. The takeaway? Excitement and caution, hand in hand — the Library of Babel is getting smarter, but someone still needs to check the index.
Sources: daniellitt.com, Lobsters
Comments