Google DeepMind Proposes AI ‘Monitors’ to Police Hyperintelligent Models

Date:

Share:


Google DeepMind has introduced a new approach to securing frontier generative AI and released a paper on April 2. DeepMind focused on two of its four key risk areas: “misuse, misalignment, mistakes, and structural risks.”

DeepMind is looking beyond current frontier AI to artificial general intelligence (AGI), human-level smarts, which could revolutionize healthcare and other industries or trigger technological chaos. There is some skepticism over whether AGI of that magnitude will ever exist.

Asserting that human-like AGI is imminent and must be prepared for is a hype strategy as old as OpenAI, which started out with a similar mission statement in 2015. Although panic over hyperintelligent AI may not be warranted, research like DeepMind’s contributes to a broader, multipronged cybersecurity strategy for generative AI.

Preventing bad actors from misusing generative AI

Misuse and misalignment are the two risk factors that would arise on purpose: misuse involves a malicious human threat actor, while misalignment describes scenarios where the AI follows instructions in ways that make it an adversary. “Mistakes” (unintentional errors) and “structural risks” (problems arising, perhaps from conflicting incentives, with no single actor) complete the four-part framework.

To address misuse, DeepMind proposes the following strategies:

  • Locking down the model weights of advanced AI systems
  • Conducting threat modeling research to identify vulnerable areas
  • Creating a cybersecurity evaluation framework tailored to advanced AI
  • Exploring other, unspecified mitigations

DeepMind acknowledges that misuse occurs with today’s generative AI — from deepfakes to phishing scams. They also cite the spread of misinformation, manipulation of popular perceptions, and “unintended societal consequences” as present-day concerns that could scale up significantly if AGI becomes a reality.

SEE: OpenAI raised $40 billion at a $300 billion valuation this week, but some of the money is contingent on the organization going for-profit.   

Preventing generative AI from taking unwanted actions on its own

Misalignment could occur when an AI conceals its true intent from users or bypasses security measures as part of a task. DeepMind suggests that “amplified oversight” — testing an AI’s output against its intended objective — might mitigate such risks. Still, implementing this is challenging. What types of example situations should an AI be trained on? DeepMind is still exploring that question.

One proposal involves deploying a “monitor,” another AI system trained to detect actions that don’t align with DeepMind’s goals. Given the complexity of generative AI, such a monitor would need precise training to distinguish acceptable actions and escalate questionable behavior for human review.



Source link

━ more like this

Justice Department Says Anthropic Can’t Be Trusted With Warfighting Systems

The Trump administration argued in a court filing on Tuesday that it did not violate Anthropic’s First Amendment rights by designating the AI...

How to watch NASA’s first spacewalk in nearly a year

Two NASA astronauts aboard the International Space Station (ISS) are about to climb into their spacesuits and enter the vacuum of space, and...

AirPods Max 2 vs. Sony WH-1000XM6: Should you get the $549 or $449 flagship headphone?

Even after dropping the new iPhone 17e, new iPad Air, and a couple of new M5-powered MacBooks (including the brand-new MacBook Neo), Apple...

Your Google Search is going to get more personalized than ever

Google is expanding its Personal Intelligence feature (previously available to paid users), bringing it to all users in the US through its AI-powered...

Subnautica 2 might finally be entering early access in May

Subnautica 2 has weathered the storm and has rescheduled its early access release. IGN reported today that the sequel to the underwater survival...
spot_img