Cutting Through the Hype – AI’s Mind on Math

You may have heard OpenAI’s model has achieved gold in the IMO 2025. If you take the headlines at face value, you may be inclined to think artificial general intelligence (AGI) is here! Let me break down what it actually means from a developer and practitioner's perspective.

Firstly, what is IMO?

It is a set of six solvable questions that mathematicians challenge themselves with, over two days each year. Each question gets progressively harder. Each question can score a maximum of seven marks.

What is impressive?

The current frontier models (Gemini 2.5, ChatGPT o3, Grok 4 etc) score approx. 30% (not worthy of bronze). OpenAI and Google have had “unreleased” models score 83%, within the top 8% of participants, this year (placing them in the gold performance category). OpenAI has said they used a version of their Operator model (now named Agent), which is a generalist model. If true, this is especially impressive as these generalist models have historically been terrible at math.

What are the caveats?

1.   The compute costs to solve these questions were in the thousands of USD.

2.   The models did not score points where creativity was required.

3.    The OpenAI Agent still performs with half the accuracy of a human at spreadsheet activities.

The bottom line here is while impressive that generalist models are getting better at math, it’s still not the tipping point for AGI many are anticipating. Compute costs are increasing with diminishing returns on intelligence. When an AI model wins gold in a spreadsheet competition then there should be some newsworthy headlines.

Measure How Much Productivity You Could Gain With Our Calculator

Our productivity calculator reveals the potential costs Traffyk can save your business and improve  productivity by when inefficient workforce communication is reduced.