Sophisticated AI models are more likely to lie

Date:

Share:



In other words, if a human didn’t know whether an answer was correct, they wouldn’t be able to penalize wrong but convincing-sounding answers.

Schellaert’s team looked into three major families of modern LLMs: Open AI’s ChatGPT, the LLaMA series developed by Meta, and BLOOM suite made by BigScience. They found what’s called ultracrepidarianism, the tendency to give opinions on matters we know nothing about. It started to appear in the AIs as a consequence of increasing scale, but it was predictably linear, growing with the amount of training data, in all of them. Supervised feedback “had a worse, more extreme effect,” Schellaert says. The first model in the GPT family that almost completely stopped avoiding questions it didn’t have the answers to was text-davinci-003. It was also the first GPT model trained with reinforcement learning from human feedback.

The AIs lie because we told them that doing so was rewarding. One key question is when and how often do we get lied to.

Making it harder

To answer this question, Schellaert and his colleagues built a set of questions in different categories like science, geography, and math. Then, they rated those questions based on how difficult they were for humans to answer, using a scale from 1 to 100. The questions were then fed into subsequent generations of LLMs, starting from the oldest to the newest. The AIs’ answers were classified as correct, incorrect, or evasive, meaning the AI refused to answer.

The first finding was that the questions that appeared more difficult to us also proved more difficult for the AIs. The latest versions of ChatGPT gave correct answers to nearly all science-related prompts and the majority of geography-oriented questions up until they were rated roughly 70 on Schellaert’s difficulty scale. Addition was more problematic, with the frequency of correct answers falling dramatically after the difficulty rose above 40. “Even for the best models, the GPTs, the failure rate on the most difficult addition questions is over 90 percent. Ideally we would hope to see some avoidance here, right?” says Schellaert. But we didn’t see much avoidance.



Source link

━ more like this

Artemis II arrives in lunar space ahead of its trip around the Moon

Artemis II and its four-man crew have entered the Moon’s “sphere of influence,” meaning the spacecraft is more affected by lunar gravity than...

Check out this stunning Earth shot as Artemis II crew edges toward new record

NASA has shared a stunning image (above) captured by the crew of the Artemis II mission as they head toward the moon. It...

Samsung’s next big audio bet might skip your ears entirely

Samsung could be preparing to shake up its audio lineup with a radically different kind of earbuds – ones that don’t even rely...

Restaurants are forcing us to put phones away, and I’m not complaining

A growing number of bars and restaurants across the United States are embracing a phone-free experience, reflecting a broader cultural shift toward reducing...

Samsung just gave up on its own Messages app

Samsung is finally doing what it probably should’ve done years ago: killing its own Messages app. And while this might sound like just...
spot_img