Why Anthropic’s Claude still hasn’t beaten Pokémon

Date:

Share:


One of the biggest things preventing the current version of Claude from getting better, Hershey said, is that “when it derives that good strategy, I don’t think it necessarily has the self-awareness to know that one strategy [it] came up with is better than another.” And that’s not a trivial problem to solve.

Still, Hershey said he sees “low-hanging fruit” for improving Claude’s Pokémon play by improving the model’s understanding of Game Boy screenshots. “I think there’s a chance it could beat the game if it had a perfect sense of what’s on the screen,” Hershey said, saying that such a model would probably perform “a little bit short of human.”

Expanding the context window for future Claude models will also probably allow those models to “reason over longer time frames and handle things more coherently over a long period of time,” Hershey said. Future models will improve by getting “a little bit better at remembering, keeping track of a coherent set of what it needs to try to make progress,” he added.



Twitch chat responds with a flood of bouncing emojis as Claude concludes an epic 78+ hour escape from Pokémon’s Mt. Moon.

Twitch chat responds with a flood of bouncing emojis as Claude concludes an epic 78+ hour escape from Pokémon’s Mt. Moon.


Credit:

Claude Plays Pokemon / Twitch


Whatever you think about impending improvements in AI models, though, Claude’s current performance at Pokémon doesn’t make it seem like it’s poised to usher in an explosion of human-level, completely generalizable artificial intelligence. And Hershey allows that watching Claude 3.7 Sonnet get stuck on Mt. Moon for 80 hours or so can make it “seem like a model that doesn’t know what it’s doing.”

But Hershey is still impressed at the way that Claude’s new reasoning model will occasionally show some glimmer of awareness and “kind of tell that it doesn’t know what it’s doing and know that it needs to be doing something different. And the difference between ‘can’t do it at all’ and ‘can kind of do it’ is a pretty big one for these AI things for me,” he continued. “You know, when something can kind of do something it typically means we’re pretty close to getting it to be able to do something really, really well.”





Source link

━ more like this

March Madness 2025: NCAA Tournament Elite Eight schedule, time, how to watch

Table of Contents Table of Contents March Madness 2025: NCAA Tournament Elite Eight schedule, times Stream March Madness on Sling How to watch March Madness from abroad...

The Partial solar eclipse will be visible at sunrise today: Here's how to watch it

Today's the day: A solar eclipse will darken the skies in the northeastern US and Canada in the early hours of Saturday. Unlike...

The Chairman™ Pro package is on sale — and it’s the only shaving kit you’ll ever need

Table of Contents Table of Contents One kit. Total control. Skin prep and recovery, covered Why now’s the time to buy There’s grooming, and then there’s grooming with...

The Garmin Vivoactive 5 is up to $90 off as new features land

Garmin is making waves today. First things first: Garmin has just launched Garmin Connect+, a paid tier of their health tracking service that...

Costco vs. Sam’s Club for Electronics: Which warehouse giant wins for your tech needs?

Table of Contents Table of Contents Membership costs and structures Electronics selection and brands Pricing and deals Customer service and return policies (for electronics) Which is better for electronics,...
spot_img