On Saturday, OpenAI researcher Alexander Wei announced that a new AI language model the company is researching has achieved gold medal-level performance on the International Mathematical Olympiad (IMO), matching a standard that fewer than 9 percent of human contestants reach each year. The announcement came despite an embargo request from IMO organizers asking AI companies to wait until July 28 to share their results.

The experimental model reportedly tackled the contest’s six proof-based problems under the same constraints as human competitors: 4.5 hours per session, with no Internet access or calculators allowed. However, several sources with inside knowledge of the process say that since OpenAI self-graded its IMO results, the legitimacy of the company’s claim may be in question. OpenAI plans to publish the proofs and grading rubrics for public review.

According to OpenAI, its achievement marks a departure from previous AI attempts at mathematical Olympiad problems, which relied on specialized theorem-proving systems that often exceeded human time limits. OpenAI says its model processed problems as plain text and generated natural-language proofs, operating like a standard language model rather than a purpose-built mathematical system.

The announcement follows Google’s July 2024 claim that its AlphaProof and AlphaGeometry 2 models earned a silver medal equivalent at the IMO—though Google’s systems required up to three days per problem rather than the 4.5-hour human time limit and needed human assistance to translate problems into formal mathematical language.

“Math is a proving ground for reasoning—structured, rigorous, and hard to fake,” the company wrote in a statement sent to Ars Technica. “This shows that scalable, general-purpose methods can now outperform hand-tuned systems in tasks long seen as out of reach.”

While the company confirmed that its next major AI model, GPT-5, is “coming soon,” it clarified that this current model is experimental. “The techniques will carry forward, but nothing with this level of capability will be released for a while,” OpenAI says. It’s likely that OpenAI needed to devote a great deal of computational resources (which means high cost) for this particular experiment, and that level of computation won’t be typical of consumer-facing AI models in the near future.