OpenAI’s performance charts in the GPT-5 launch video are such a mess you have to think GPT-5 itself probably made them, and the company’s attempted fixes raise even more questions


What to make of OpenAI’s latest GPT-5 chatbot? Let’s just say the reception from users has been sufficiently mixed to have OpenAI head honcho Sam Altman posting apologetically on X. And more than once. But one thing we can say for sure, the charts in the launch video were a bizarre mess that OpenAI has since attempted to tidy up, to mixed avail.

Most obviously, the claimed SWE-bench performance of GPT-5 versus older model shown on launch day was badly botched. The chart showed accuracy figures of 74.9% for ChatGPT 5, 69.1% for OpenAi o3 and 30.8% for GPT-4o.

Before and after. (Image credit: OpenAI)

Problem is, the bar graph heights were exactly the same for the latter two, giving the at-a-glance impression of total dominance for GPT-5 when in fact it is only marginally superior to OpenAI o3.





buspartabs.online