AI Agents Are Terrible Freelance Workers

Date:

Share:


Even the best artificial intelligence agents are fairly hopeless at online freelance work, according to an experiment that challenges the idea of AI replacing office workers en masse.

The Remote Labor Index, a new benchmark developed by researchers at data annotation company Scale AI and the Center for AI Safety (CAIS), a nonprofit, measures the ability of frontier AI models to automate economically valuable work.

The researchers gave several leading AI agents a range of simulated freelance work and found that even the best could perform less than 3 percent of the work, earning $1,810 out of a possible $143,991. The researchers looked at several tools and found the most capable to be Manus from a Chinese startup of the same name, followed by Grok from xAI, Claude from Anthropic, ChatGPT from OpenAI, and Gemini from Google.

“I should hope this gives much more accurate impressions as to what’s going on with AI capabilities,” says Dan Hendrycks, director of CAIS. He adds that while some agents have improved significantly over the past year or so, that does not mean that this will continue at the same rate.

Spectacular AI advances have led to speculation about AI soon surpassing human intelligence and replacing vast numbers of workers. In March, Dario Amodei, CEO of Anthropic, suggested that 90 percent of coding work would be automated within a matter of months.

Previous waves of AI have inspired misplaced predictions about job displacement, for example concerning the imminent replacement of radiologists with AI algorithms.

The researchers generated a range of freelance tasks through verified Upwork workers. The tasks span a range of work including graphic design, video editing, game development, and administrative chores like scraping data. They combined a description of each job with a directory of files needed to perform the work and an example of a finished project produced by a human.

Hendrycks says that while AI models have gotten better at coding, math, and logical reasoning in recent years, they still struggle to use different tools and to perform complex tasks that involve numerous steps. “They don’t have long-term memory storage and can’t do continual learning from experiences. They can’t pick up skills on the job like humans,” he says.

The analysis offers a counterpoint to a benchmark of economic work offered in September by OpenAI called GDPval, which purports to measure economically valuable work. According to GDPval, frontier AI models such as GPT-5 are approaching human abilities on 220 tasks across a range of office jobs. OpenAI did not provide a comment.



Source link

━ more like this

1Password helps prevent your passwords from going to scam sites

Phishing scams are evolving fast, and AI-assisted sites are making fake login pages look more convincing than ever. To help users stay safe,...

You might actually be able to buy a Tesla robot in 2027

Tesla CEO Elon Musk has once again laid out an ambitious timeline for the company’s long-awaited humanoid robot, Optimus. Speaking at the World...

Your next road trip is booked: Forza Horizon 6 comes this May

After months of anticipation and speculation, and even a leaked release date, Playground Games and Xbox have finally given fans what they’ve been...

Here’s when you can buy AMD’s Ryzen 7 9850X3D and how much it’ll cost

AMD has finally confirmed pricing and availability for its Ryzen 7 9850X3D processor, the company’s newest near-flagship desktop CPU aimed at gaming enthusiasts....

Sennheiser introduces new TV headphones bundle with Auracast

Sennheiser has unveiled its RS 275 TV Headphones, which are bundled with a BTA1 digital receiver. These headphones use Auracast technology to provide...
spot_img