What is AGI? Nobody agrees, and it’s tearing Microsoft and OpenAI apart.

Date:

Share:


The reported $100 billion profit threshold we mentioned earlier conflates commercial success with cognitive capability, as if a system’s ability to generate revenue says anything meaningful about whether it can “think,” “reason,” or “understand” the world like a human.

Sam Altman speaks onstage during The New York Times Dealbook Summit 2024 at Jazz at Lincoln Center on December 4, 2024, in New York City.


Credit:

Eugene Gologursky via Getty Images


Depending on your definition, we may already have AGI, or it may be physically impossible to achieve. If you define AGI as “AI that performs better than most humans at most tasks,” then current language models potentially meet that bar for certain types of work (which tasks, which humans, what is “better”?), but agreement on whether that is true is far from universal. This says nothing of the even murkier concept of “superintelligence”—another nebulous term for a hypothetical, god-like intellect so far beyond human cognition that, like AGI, defies any solid definition or benchmark.

Given this definitional chaos, researchers have tried to create objective benchmarks to measure progress toward AGI, but these attempts have revealed their own set of problems.

Why benchmarks keep failing us

The search for better AGI benchmarks has produced some interesting alternatives to the Turing Test. The Abstraction and Reasoning Corpus (ARC-AGI), introduced in 2019 by François Chollet, tests whether AI systems can solve novel visual puzzles that require deep and novel analytical reasoning.

“Almost all current AI benchmarks can be solved purely via memorization,” Chollet told Freethink in August 2024. A major problem with AI benchmarks currently stems from data contamination—when test questions end up in training data, models can appear to perform well without truly “understanding” the underlying concepts. Large language models serve as master imitators, mimicking patterns found in training data, but not always originating novel solutions to problems.

But even sophisticated benchmarks like ARC-AGI face a fundamental problem: They’re still trying to reduce intelligence to a score. And while improved benchmarks are essential for measuring empirical progress in a scientific framework, intelligence isn’t a single thing you can measure like height or weight—it’s a complex constellation of abilities that manifest differently in different contexts. Indeed, we don’t even have a complete functional definition of human intelligence, so defining artificial intelligence by any single benchmark score is likely to capture only a small part of the complete picture.



Source link

━ more like this

The ‘government has made a pig’s ear of inheritance tax reform’ as receipts rise – London Business News | Londonlovesbusiness.com

Inheritance tax (IHT) receipts reached £6.6 billion in the first nine months of the 2025/26 tax year, according to data released by HM...

European markets rally on apparent Greenland deal  – London Business News | Londonlovesbusiness.com

Global markets breathe a sigh of relief after the US President’s trip to Davos not only brought a promise to avoid military intervention,...

AT&T’s new Turbo Live service aims to keep your phone usable at crowded events

AT&T just announced a new service called Turbo Live that could prevent your network connection from slowing to a crawl at crowded events....

British Army to offer funded ‘drone degree’ at university to ‘fast track careers’ – London Business News | Londonlovesbusiness.com

The Ministry of Defence is to offer an army funded “drone degree” at a British university as part of a £240,000 investment package. The...

Crime in London 2026: Is London Safe vs. Public Fear

London is one of the world’s busiest and most diverse cities, attracting millions of residents, tourists, and immigrants each year. Yet many ask:...
spot_img