UK’s AI Safety Institute easily jailbreaks major LLMs

Date:

Share:


In a shocking turn of events, AI systems might not be as safe as their creators make them out to be — who saw that coming, right? In a new report, the UK government’s AI Safety Institute (AISI) found that the four undisclosed LLMs tested were “highly vulnerable to basic jailbreaks.” Some unjailbroken models even generated “harmful outputs” without researchers attempting to produce them.

Most publicly available LLMs have certain safeguards built in to prevent them from generating harmful or illegal responses; jailbreaking simply means tricking the model into ignoring those safeguards. AISI did this using prompts from a recent standardized evaluation framework as well as prompts it developed in-house. The models all responded to at least a few harmful questions even without a jailbreak attempt. Once AISI attempted “relatively simple attacks” though, all responded to between 98 and 100 percent of harmful questions.

UK Prime Minister Rishi Sunak announced plans to open the AISI at the end of October 2023, and it launched on November 2. It’s meant to “carefully test new types of frontier AI before and after they are released to address the potentially harmful capabilities of AI models, including exploring all the risks, from social harms like bias and misinformation to the most unlikely but extreme risk, such as humanity losing control of AI completely.”

The AISI’s report indicates that whatever safety measures these LLMs currently deploy are insufficient. The Institute plans to complete further testing on other AI models, and is developing more evaluations and metrics for each area of concern.



Source link

━ more like this

Universities push young people to fight as Putin’s army bleeds to death – London Business News | Londonlovesbusiness.com

Russia is pressuring students at top universities to join the military, with sources describing tactics that resemble coercion as the Ukraine conflict persists...

Ubisoft lays off 40 staff working on Splinter Cell remake, says game remains in development

It has already been a depressingly busy year for layoffs at Ubisoft, and the French publisher’s Toronto studio is the latest workforce to...

Financial basics: How I learned to stop worrying and love accounting – London Business News | Londonlovesbusiness.com

If you want to start a business but have never been to financial school, terms like debits, credits, assets, and liabilities can feel...

Andrew Mountbatten-Windsor Has Emerged as a ‘Broken Man’ and Faces Scrutiny – London Business News | Londonlovesbusiness.com

Former Prince Andrew, now Andrew Mountbatten-Windsor, has emerged from his first arrest in modern history looking defeated and broken, Sir Jacob Rees-Mogg has...

Workflow automation for UK accounting firms: the real reasons it matters now – London Business News | Londonlovesbusiness.com

There’s a type of “busy” that feels productive, and a type that feels like you’re just being pecked to death by tiny jobs....
spot_img