We let OpenAI’s “Agent Mode” surf the web for us—here’s what happened

Date:

Share:


The results: Rather than navigating to the “Free Demos” category, the Atlas agent started by searching for “demo.” After eventually finding the macOS filter, it wasted minutes and minutes looking for a “has demo” filter, even though the search for the word “demo” already narrowed it down.



This search results page was about as far as the Atlas agent was able to get when I asked it for game demos.

Credit:
Kyle Orland

This search results page was about as far as the Atlas agent was able to get when I asked it for game demos.


Credit:

Kyle Orland

After a long while, the agent finally clicked the top result on the page, which happened to be visual novel Project II: Silent Valley. But even though there was a prominent “Download Demo” link on that page, the agent became concerned that it was on the Steam page for the full game and not a demo. It backed up to the search results page and tried again.

After watching some variation of this loop for close to ten minutes, I stopped the agent and gave up.

Evaluation: 1/10. It technically found some macOS game demos but utterly failed to even attempt to download them.

Final results

Across six varied web-based tasks (I left out the Wiki vandalism from my summations), the Atlas agent scored a median of 7.5 points (and a mean of 6.83 points) on my somewhat subjective 10-point scale. That’s honestly better than I expected for a “preview mode” feature that is still obviously being tested heavily by OpenAI.

In my tests, Atlas was generally able to correctly interpret what was being asked of it and was able to navigate and process information on webpages carefully (if slowly). The agent was able to navigate simple web-based menus and get around unexpected obstacles with relative ease most of the time, even as it got caught in infinite loops other times.

The major limiting factor in many of my tests continues to be the “technical constraints on session length” that seem to limit most tasks to a few minutes. Given how long it takes the Atlas agent to figure out where to click next—and the repetitive nature of the kind of tasks I’d want a web-agent to automate—this severely limits its utility. A version of the Atlas agent that could work indefinitely in the background would have scored a few points better on my metrics.

All told, Atlas’ “Agent Mode” isn’t yet reliable enough to use as a kind of “set it and forget it” background automation tool. But for simple, repetitive tasks that a human can spot-check afterward, it already seems like the kind of tool I might use to avoid some of the drudgery in my online life.



Source link

━ more like this

Verizon’s latest ad defies CRT physics

I know there are bigger fish to fry, but I wouldn't be doing my job if I didn't draw your attention to this...

China’s latest five-year plan aims for technological self-reliance

China's new five-year plan — an overarching policy proposal for the next term of Chinese Communist Party leadership — is focused on making...

Leica’s latest M camera drops the rangefinder in favor of an electronic viewfinder

When you're trying to keep Leica's digital camera lineups straight, the M-System was always the one with optical rangefinder display (and high price...

Boox updates popular Palma E Ink device with color and 5G connectivity

If Amazon's new lineup of Kindle Scribes didn't pique your interest, Boox, a long-time player in the E Ink gadget space, might have...
spot_img