Jump to content
  • Sign Up
×
×
  • Create New...

Recommended Posts

  • Diamond Member

This is the hidden content, please

ChatGPT’s latest model may be a regression in performance

According to a new report from
This is the hidden content, please
, OpenAI’s flagship large language model for ChatGPT, GPT-4o, has significantly regressed in recent weeks, putting the state-of-the-art model’s performance on par with the far smaller, and notably less capable, GPT-4o-mini model.

This analysis comes less than 24 hours after the company announced an upgrade for the GPT-4o model. “The model’s creative writing ability has leveled up–more natural, engaging, and tailored writing to improve relevance & readability,”

This is the hidden content, please
on X. “It’s also better at working with uploaded files, providing deeper insights & more thorough responses.” Whether those claims continue to hold up is now being cast in doubt.

“We have completed running our independent evals on OpenAI’s GPT-4o release yesterday and are consistently measuring materially lower eval scores than the August release of GPT-4o,” the

This is the hidden content, please
via an X post on Thursday, noting that the model’s Artificial Analysis Quality Index decreased from 77 to 71 (and is now equal to that of GPT-4o mini).

What’s more, GPT-4o’s performance on the GPQA Diamond benchmark decreased from 51% to 39% while its MATH benchmarks decreased from 78% to 69%.

Simultaneously, the researchers discovered more than a doubling in the speed increase of the model’s responses, accelerating from around 80 output tokens per second to roughly 180 tokens/s. “We have generally observed significantly faster speeds on launch day for OpenAI models (likely due to OpenAI provisioning capacity ahead of adoption), but previously have not seen a 2x speed difference,” the researchers wrote.

Wait – is the new GPT-4o a smaller and less intelligent model?

We have completed running our independent evals on OpenAI’s GPT-4o release yesterday and are consistently measuring materially lower eval scores than the August release of GPT-4o.

GPT-4o (Nov) vs GPT-4o (Aug):
➤…

This is the hidden content, please

— Artificial Analysis (@ArtificialAnlys)

This is the hidden content, please

“Based on this data, we conclude that it is likely that OpenAI’s Nov 20th GPT-4o model is a smaller model than the August release,” they continued. “Given that OpenAI has not cut prices for the Nov 20th version, we recommend that developers do not shift workloads away from the August version without careful testing.”

GPT-4o was first released in May 2024 to surpass the existing GPT-3.5 and GPT-4 models. GPT-4o offers state-of-the-art benchmark results in voice, multilingual, and vision tasks, according to OpenAI, making it ideal for advanced applications like real-time translation and conversational AI.













This is the hidden content, please

#ChatGPTs #latest #model #regression #performance

This is the hidden content, please

This is the hidden content, please

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
  • Vote for the server

    To vote for this server you must login.

    Jim Carrey Flirting GIF

  • Recently Browsing   0 members

    • No registered users viewing this page.

Important Information

Privacy Notice: We utilize cookies to optimize your browsing experience and analyze website traffic. By consenting, you acknowledge and agree to our Cookie Policy, ensuring your privacy preferences are respected.