Anthropic says its new Claude 3 AI chatbot scores better on key benchmarks than GPT-4

Date:

Share:


The battle between AI chatbots is more than a two-horse race. Anthropic, the company formed by several ex-OpenAI employees, claims its new Claude 3 language model outperforms ChatGPT and Google’s Gemini in several key industry benchmarks. It even hit “near-human” levels on some tasks, the company wrote in a blog.

There are three new chatbots under the Claude 3 umbrella, including Haiku, Sonnet, and Opus. Sonnet powers the Claude.ai chatbot and is offered for free with an email sign-in. Meanwhile, Opus is the largest and most powerful LLM and will be available with a $20 per month subscription via the “Claude Pro” service. It’s also multi-modal, so it can work with both text and image inputs, unlike past versions.

All Claude 3 models “can power live customer chats, auto-completions and data extraction tasks where responses must be immediate and in real-time,” the company said. On top of promising “near-instant results,” they can supposedly handle longer, multi-step instructions with increased accuracy.

Anthropic says its new Claude 3 AI chatbot scores better on key benchmarks than GPT-4

Anthropic

Opus showed better graduate-level reasoning than GPT-4, scoring 14.7 percent higher in that test than GPT-4. It also beat OpenAI’s chatbot in tasks involving math, coding, reasoning and knowledge.

They also top past Claude models. “For the vast majority of workloads, Sonnet is 2x faster than Claude 2 and Claude 2.1 with higher levels of intelligence. It excels at tasks demanding rapid responses, like knowledge retrieval or sales automation. Opus delivers similar speeds to Claude 2 and 2.1, but with much higher levels of intelligence,” according to Anthropic.

Meanwhile Haiku, the smallest version of Claude 3, is “the fastest and most cost-effective model on the market.” To that end, it’s capable of reading a dense research paper complete with charts and graphs in under three seconds.

The company also noted that Claude 3 “can process a wide range of visual formats, including photos, charts, graphs and technical diagrams,” aiding companies that use PDFs, flowcharts, or presentation slides. It’ll also be less likely to refuse harmless content thanks to a more nuanced understanding of requests, while still recognizing “real harm.”

Anthropic has said that Claude AI is guided by 10 secret foundational pillars of fairness. Claude 3 was trained on both nonpublic internal and public-facing data, using hardware from Amazon Web Services (AWS) and Google Cloud (Amazon recently invested $4 billion in Anthropic).

Claude 3 Opus and Claude 3 Sonnet are available now through Anthropic’s API, with Haiku set to follow soon. Sonnet is also accessible through Amazon Bedrock and in private preview on Google Cloud’s Vertex AI Model Garden.

This article contains affiliate links; if you click such a link and make a purchase, we may earn a commission.



Source link

━ more like this

Prepare for Civilization 7 with this $3 Steam Summer Sale deal | Tech Reader

One of the most exciting announcements for PC gaming fans during Geoff Keighley’s Summer Game Fest live stream this year was for Sid...

Sir Keir Starmer more likely to enjoy a long spell in No.10 like Thatcher and Blair – London Business News | Londonlovesbusiness.com

Sir Keir Starmer is likely to become the UK’s longest-serving Prime Minister since David Cameron after the Labour Party won...

At a Key Juncture, Biden Again Gambles on ABC’s George Stephanopoulos

With his poll numbers dropping and Democrats writing off his candidacy, Joseph R. Biden Jr. sat down with George Stephanopoulos of ABC News...

The future of the USD/JPY: Government intervention or wait and see? – London Business News | Londonlovesbusiness.com

The USD/JPY pair is trading above 161.00 on Thursday, amid the closure of U.S. markets due to a public holiday....

Gold prices near four-week high amid US economic slowdown and global tensions – London Business News | Londonlovesbusiness.com

Gold prices increased on Wednesday and remained relatively stable on Thursday , approaching their highest levels in four weeks. ...
spot_img