OpenAI caught claiming AI solved math problems it just looked up

A bold central OpenAI logo formed within a glossy magnifying glass hovering over scattered white papers with abstract diagrams on a wooden desk, an unlit lightbulb in the background contrasted by a small spark of light, warm golden foreground against cool teal backdrop, dramatic close-up with shallow depth of field, high brightness, no text.

OpenAI drew sharp criticism from AI researchers and mathematicians after claims that its model solved well-known Erdős problems fell apart. According to FindArticles, a senior OpenAI executive celebrated GPT-5 for making progress on multiple open Erdős problems and finding solutions to others. The claims unraveled when mathematician Thomas Bloom, curator of the Erdős Problems site, explained that „open“ on his page meant he was ignorant of a solution, not that one did not exist. The model retrieved existing proof terms rather than deriving new mathematics.

Retrieval Is Not Discovery

OpenAI researcher Sébastien Bubeck later admitted that the model found solutions in the literature. He added the caveat that this is still nontrivial because math research is sprawling and fragmented. Competitors were less forgiving. Top managers at Meta and Google DeepMind said publicly that this was a self-inflicted wound. If literature search is confused with discovery of new knowledge, credibility will suffer, they argued.

Appreciating the existence of a known proof is helpful. Causing one to exist is transformative. The distinction is significant because language models can be superb at retrieval, summarization, and pattern completion without being any good at rigorous deductive reasoning. They can also hallucinate steps that sound like they should work but fail with formal proof checkers.

The Standard for Real Breakthroughs

In mathematics, a breakthrough involves a new argument and solution that is upheld by expert scrutiny or mechanical validation. That bar is high by design. Communities around proof assistants like Lean, Isabelle, and Coq have demonstrated how computer-checked proofs can raise standards. The Lean-driven formalization of parts of Peter Scholze’s work is a famous example of humans and machines collaborating to raise rigor.

Competition Drives Overselling

The episode comes as OpenAI, Google DeepMind, and Meta are locked in a competitive race to become the technical leader in reasoning. That competition can hasten real progress and deepen the incentive to oversell. On popular math benchmarks, recent models regularly exceed 90% when using chain-of-thought prompts and careful sampling. That’s impressive, but it’s not the same as creating a new theorem or an extended original proof that specialists would recognize.

The quickest way to reset expectations is simple. Let proofs, code, and third-party verification do the talking. Until that happens, claiming victory on open problems will be not so much innovating as scoring an own goal.

Total
0
Shares
Previous Post
Editorial close-up portrait of Donald Trump in three-quarter view with a gleaming gold crown floating beside his shoulder, large out-of-focus crowd of protesters with blank placards in the background, warm gold highlights against cool blue dusk tones, high brightness, shallow depth of field, neutral expression, no text.

Trump posts AI video showing himself as king after protests

Next Post
Close-up of a white porcelain theater mask half brushed with muddy paint and overlaid by a subtle shimmering 3D wireframe, beside an unlabeled clear tin partly filled with glowing geometric shards, dramatic warm spotlight against cool cyan backdrop, crisp photoreal detail, high brightness, no people, no text.

Charities use fake AI images of poor kids to raise money

Related Posts