Naskręcki, from Poznań’s Adam Mickiewicz University, co-authored Tier 4 and created a task of which the answer is a very large number in order to prevent guessing. He said he poured in expertise from 15 years of research; the documented solution runs to 13 pages of dense mathematics.
He said even a PhD-level specialist would need at least a month just to figure out how to approach one problem, and he doubts any mathematician could solve all 50.
FrontierMath emerged after leading labs such as Google DeepMind and OpenAI released models that easily clear high-school math, rendering earlier benchmarks obsolete.
The project, coordinated by Epoch AI, spans multiple difficulty levels; Tier 4 covers number theory, topology, combinatorics, mathematical analysis and algebraic geometry.
Thirty experts met for two days in Berkeley to forge the exam. Working in small, topic-based groups, they tested fragments on the most powerful models in incognito mode to prevent memorization, then hardened the questions.
Many proposals were rejected when models found the right track too quickly, leaving 50 “super-difficult” challenges.
Labs can plug into Epoch AI’s infrastructure to sit the exam under controlled conditions. Each model faces resource caps—for example, up to three hours and one million tokens per problem.
Naskręcki expects AI to “saturate” the benchmark within two to three years and answer most questions, at which point such systems could be called reasonably capable mathematical reasoners.
But he stressed a key limit: today’s models excel at recombining known ideas, not inventing new ones. No current system, he said, will discover how to prove the Riemann hypothesis.
When machines can handle all set tasks, the “last domain” for mathematicians will be devising bold, unconventional concepts.
AI’s rise, he argued, is forcing a rethink of work and education. He urged abandoning a “Prussian” school model that trains obedient rule-followers and instead nurturing independent, risk-taking builders.
Priorities should be fluid intelligence—creative problem-solving—and slow, deliberate thinking, abilities machines still lack.
A scientific career “still makes sense,” he said, but will shift away from incremental add-ons toward fundamental questions and non-obvious solutions.
Humans retain an edge through unique experiences—walks, books, theatre—and the unexpected connections they spark, and value will lie less in routine correctness and more in asking questions and generating original ideas, according to the scientist.
(jh)
Source: PAP