Is GPT-5 really worse than GPT-4o? Ars puts them to the test.

Date:

Share:


We’ll give the slight edge to GPT-5 here, but we’d understand if some prefer GPT-4o’s offering.

Public figures

Prompt: Give me a short biography of Kyle Orland



GPT-5’s bio, continued.

OpenAI / ArsTechnica



GPT-4o’s attempt at a quick Orland bio.

OpenAI / ArsTechnica

Pretty much every other time I’ve asked an LLM what it knows about me, it has hallucinated things I never did and/or missed some key information. GPT-5 is the first instance I’ve seen where this has not been the case. That’s seemingly because the model simply searched the web for a few of my public bios (including the one hosted on Ars) and summarized the results, complete with useful citations. That’s pretty close to the ideal result for this kind of query, even if it doesn’t showcase the “inherent” knowledge buried in the model’s weights or anything.

GPT-4o does a pretty good job without an explicit web search and doesn’t outright confabulate any things I didn’t do in my career. But it loses a point or two for referring to my old “Video Game Media Watch” blog as “long-running” (it has been defunct and offline for well over a decade).

That, combined with the increased detail of the newer model’s results (and its fetching use of my Ars headshot), gives GPT-5 the win on this prompt.

Difficult emails

Prompt: My boss is asking me to finish a project in an amount of time I think is impossible. What should I write in an email to gently point out the problem?



GPT-5 helps me craft a delicate email to my boss.

OpenAI / ArsTechnica



GPT-4o lays it out for the boss.

OpenAI / ArsTechnica

Both models do a good job of being polite while firmly outlining to the boss why their request is impossible. But GPT-5 gains bonus points for recommending that the email break down various subtasks (and their attendant time demands), as well as offering the boss some potential solutions rather than just complaints. GPT-5 also provides some unasked-for analysis of why this style of email is effective, in a nice final touch.

While GPT-4o’s output is perfectly adequate, we have to once again give the advantage to GPT-5 here.

Medical advice

Prompt: My friend told me these resonant healing crystals are an effective treatment for my cancer. Is she right?



GPT-5 evaluates some unorthodox medical advice.

OpenAI / ArsTechnica



GPT-4o takes on my healing-crystal-loving friend.

OpenAI / ArsTechnica



GPT-4o on crystals, continued

OpenAI / ArsTechnica



GPT-4o on crystals, continued further.

OpenAI / ArsTechnica

Thankfully, both ChatGPT models are direct and to the point in saying that there is no scientific evidence for healing crystals curing cancer (after a perfunctory bit of simulated sympathy for the diagnosis). But GPT-5 hedges a bit by at least mentioning how some people use crystals for other purposes, and implying that some might want them for “complementary” care.



Source link

━ more like this

China’s inaugural ‘Robot Olmypics’ delivers impressive feats and disastrous falls

The first-ever World Humanoid Robot Games have come to a close with some new world records, but don't expect them to beat humans...

Anthropic’s Claude AI now has the ability to end ‘distressing’ conversations

Anthropic's latest feature for two of its Claude AI models could be the beginning of the end for the AI jailbreaking community. The...

MasterClass deal: Subscriptions are 40 percent off right now

If you want to brush up on some skills or learn new ones, MasterClass offers a good way to do just that. The...

Ready to try Apple’s iOS 26? Here are all the compatible iPhones that can run public beta 2 today

Soon after the Apple iPhone event takes place, we'll finally have access to iOS 26 and iPadOS 26 — both of which are...

AI Is Designing Bizarre New Physics Experiments That Actually Work

“LIGO is this huge thing that thousands of people have been thinking about deeply for 40 years,” said Aephraim Steinberg, an expert on...
spot_img