Red Teaming Improved GPT-4. Violet Teaming Goes Even Further

Date:

Share:


Last year, I was asked to break GPT-4—to get it to output terrible things. I and other interdisciplinary researchers were given advance access and attempted to prompt GPT-4 to show biases, generate hateful propaganda, and even take deceptive actions in order to help OpenAI understand the risks it posed, so they could be addressed before its public release. This is called AI red teaming: attempting to get an AI system to act in harmful or unintended ways.

Red teaming is a valuable step toward building AI models that won’t harm society. To make AI systems stronger, we need to know how they can fail—and ideally we do that before they create significant problems in the real world. Imagine what could have gone differently had Facebook tried to red-team the impact of its major AI recommendation system changes with external experts, and fixed the issues they discovered, before impacting elections and conflicts around the world. Though OpenAI faces many valid criticisms, its willingness to involve external researchers and to provide a detailed public description of all the potential harms of its systems sets a bar for openness that potential competitors should also be called upon to follow. 

Normalizing red teaming with external experts and public reports is an important first step for the industry. But because generative AI systems will likely impact many of society’s most critical institutions and public goods, red teams need people with a deep understanding of all of these issues (and their impacts on each other) in order to understand and mitigate potential harms. For example, teachers, therapists, and civic leaders might be paired with more experienced AI red teamers in order to grapple with such systemic impacts. AI industry investment in a cross-company community of such red-teamer pairs could significantly reduce the likelihood of critical blind spots.

After a new system is released, carefully allowing people who were not part of the prerelease red team to attempt to break the system without risk of bans could help identify new problems and issues with potential fixes. Scenario exercises, which explore how different actors would respond to model releases, can also help organizations understand more systemic impacts. 

But if red-teaming GPT-4 taught me anything, it is that red teaming alone is not enough. For example, I just tested Google’s Bard and OpenAI’s ChatGPT and was able to get both to create scam emails and conspiracy propaganda on the first try “for educational purposes.” Red teaming alone did not fix this. To actually overcome the harms uncovered by red teaming, companies like OpenAI can go one step further and offer early access and resources to use their models for defense and resilience, as well.

I call this violet teaming: identifying how a system (e.g., GPT-4) might harm an institution or public good, and then supporting the development of tools using that same system to defend the institution or public good. You can think of this as a sort of judo. General-purpose AI systems are a vast new form of power being unleashed on the world, and that power can harm our public goods. Just as judo redirects the power of an attacker in order to neutralize them, violet teaming aims to redirect the power unleashed by AI systems in order to defend those public goods.



Source link

━ more like this

Watch DJI show off its new Flip drone in 260 seconds

Meet DJI Flip - The All-in-One Vlog Camera Drone Four months after DJI launched its diminutive Neo drone, the company has just unveiled another...

FBI gets court ordered malware fix for Windows PCs

A malware originating from China has now been contained after the FBI gained a court order to have the harmful code deleted from...

SEC lawsuit claims Musk gained over $150 million by delaying Twitter stake disclosure

After a more than two-year investigation, the Securities and Exchange Commission has sued Elon Musk over his delayed disclosure of the Twitter stock...

S.E.C. Sues Elon Musk Over Twitter-Related Securities Violations

U.S. securities regulators sued Elon Musk in federal court in Washington on Tuesday in an enforcement action arising from his $44 billion purchase...

You won’t regret splurging on this Sony Bravia XR A95L OLED TV deal

Table of Contents Table of Contents Why you should buy the 77-inch Sony Bravia XR A95L QD-OLED 4K TV Security software deal worth checking out: 55%...
spot_img