Google DeepMind Proposes AI ‘Monitors’ to Police Hyperintelligent Models

Google DeepMind has introduced a new approach to securing frontier generative AI and released a paper on April 2. DeepMind focused on two of its four key risk areas: “misuse, misalignment, mistakes, and structural risks.”

DeepMind is looking beyond current frontier AI to artificial general intelligence (AGI), human-level smarts, which could revolutionize healthcare and other industries or trigger technological chaos. There is some skepticism over whether AGI of that magnitude will ever exist.

Asserting that human-like AGI is imminent and must be prepared for is a hype strategy as old as OpenAI, which started out with a similar mission statement in 2015. Although panic over hyperintelligent AI may not be warranted, research like DeepMind’s contributes to a broader, multipronged cybersecurity strategy for generative AI.

Preventing bad actors from misusing generative AI

Misuse and misalignment are the two risk factors that would arise on purpose: misuse involves a malicious human threat actor, while misalignment describes scenarios where the AI follows instructions in ways that make it an adversary. “Mistakes” (unintentional errors) and “structural risks” (problems arising, perhaps from conflicting incentives, with no single actor) complete the four-part framework.

To address misuse, DeepMind proposes the following strategies:

Locking down the model weights of advanced AI systems
Conducting threat modeling research to identify vulnerable areas
Creating a cybersecurity evaluation framework tailored to advanced AI
Exploring other, unspecified mitigations

DeepMind acknowledges that misuse occurs with today’s generative AI — from deepfakes to phishing scams. They also cite the spread of misinformation, manipulation of popular perceptions, and “unintended societal consequences” as present-day concerns that could scale up significantly if AGI becomes a reality.

SEE: OpenAI raised $40 billion at a $300 billion valuation this week, but some of the money is contingent on the organization going for-profit.

Preventing generative AI from taking unwanted actions on its own

Misalignment could occur when an AI conceals its true intent from users or bypasses security measures as part of a task. DeepMind suggests that “amplified oversight” — testing an AI’s output against its intended objective — might mitigate such risks. Still, implementing this is challenging. What types of example situations should an AI be trained on? DeepMind is still exploring that question.

One proposal involves deploying a “monitor,” another AI system trained to detect actions that don’t align with DeepMind’s goals. Given the complexity of generative AI, such a monitor would need precise training to distinguish acceptable actions and escalate questionable behavior for human review.

Source link

Google DeepMind Proposes AI ‘Monitors’ to Police Hyperintelligent Models

Preventing bad actors from misusing generative AI

Preventing generative AI from taking unwanted actions on its own

━ more like this

Justice Department Says Anthropic Can’t Be Trusted With Warfighting Systems

How to watch NASA’s first spacewalk in nearly a year

AirPods Max 2 vs. Sony WH-1000XM6: Should you get the $549 or $449 flagship headphone?

Your Google Search is going to get more personalized than ever

Subnautica 2 might finally be entering early access in May

━ about

━ follow us

━ subscribe