Polish mathematician helps craft ultra-hard exam to test AI’s limits

21.08.2025 14:50

Polish mathematician Bartosz Naskręcki helped build the toughest tier of FrontierMath, a new exam to compare AI models, and says current systems can answer only 4 of 50 problems designed to exceed any single human’s capacity.

AI’s rise, the Polish scientist argued, is forcing a rethink of work and education. File photo. Colin Behrens/CC0/Pixabay

Naskręcki, from Poznań’s Adam Mickiewicz University, co-authored Tier 4 and created a task of which the answer is a very large number in order to prevent guessing. He said he poured in expertise from 15 years of research; the documented solution runs to 13 pages of dense mathematics.

He said even a PhD-level specialist would need at least a month just to figure out how to approach one problem, and he doubts any mathematician could solve all 50.

FrontierMath emerged after leading labs such as Google DeepMind and OpenAI released models that easily clear high-school math, rendering earlier benchmarks obsolete.

The project, coordinated by Epoch AI, spans multiple difficulty levels; Tier 4 covers number theory, topology, combinatorics, mathematical analysis and algebraic geometry.

Thirty experts met for two days in Berkeley to forge the exam. Working in small, topic-based groups, they tested fragments on the most powerful models in incognito mode to prevent memorization, then hardened the questions.

Many proposals were rejected when models found the right track too quickly, leaving 50 “super-difficult” challenges.

Labs can plug into Epoch AI’s infrastructure to sit the exam under controlled conditions. Each model faces resource caps—for example, up to three hours and one million tokens per problem.

Naskręcki expects AI to “saturate” the benchmark within two to three years and answer most questions, at which point such systems could be called reasonably capable mathematical reasoners.

But he stressed a key limit: today’s models excel at recombining known ideas, not inventing new ones. No current system, he said, will discover how to prove the Riemann hypothesis.

When machines can handle all set tasks, the “last domain” for mathematicians will be devising bold, unconventional concepts.

AI’s rise, he argued, is forcing a rethink of work and education. He urged abandoning a “Prussian” school model that trains obedient rule-followers and instead nurturing independent, risk-taking builders.

Priorities should be fluid intelligence—creative problem-solving—and slow, deliberate thinking, abilities machines still lack.

A scientific career “still makes sense,” he said, but will shift away from incremental add-ons toward fundamental questions and non-obvious solutions.

Humans retain an edge through unique experiences—walks, books, theatre—and the unexpected connections they spark, and value will lie less in routine correctness and more in asking questions and generating original ideas, according to the scientist.

(jh)

Source: PAP

Polish mathematician helps craft ultra-hard exam to test AI’s limits

Most Popular News /

Poland's next challenge: long-term imagination, not just growing GDP [OPINION]

Spanish fans attacked on highway between Warsaw and Białystok in eastern Poland

Polish basketry traditions added to UNESCO cultural heritage list

Polish media reveal new details of Russian Hermitage scholar’s detention in Warsaw