Sophisticated AI models are more likely to lie

Date:

Share:



In other words, if a human didn’t know whether an answer was correct, they wouldn’t be able to penalize wrong but convincing-sounding answers.

Schellaert’s team looked into three major families of modern LLMs: Open AI’s ChatGPT, the LLaMA series developed by Meta, and BLOOM suite made by BigScience. They found what’s called ultracrepidarianism, the tendency to give opinions on matters we know nothing about. It started to appear in the AIs as a consequence of increasing scale, but it was predictably linear, growing with the amount of training data, in all of them. Supervised feedback “had a worse, more extreme effect,” Schellaert says. The first model in the GPT family that almost completely stopped avoiding questions it didn’t have the answers to was text-davinci-003. It was also the first GPT model trained with reinforcement learning from human feedback.

The AIs lie because we told them that doing so was rewarding. One key question is when and how often do we get lied to.

Making it harder

To answer this question, Schellaert and his colleagues built a set of questions in different categories like science, geography, and math. Then, they rated those questions based on how difficult they were for humans to answer, using a scale from 1 to 100. The questions were then fed into subsequent generations of LLMs, starting from the oldest to the newest. The AIs’ answers were classified as correct, incorrect, or evasive, meaning the AI refused to answer.

The first finding was that the questions that appeared more difficult to us also proved more difficult for the AIs. The latest versions of ChatGPT gave correct answers to nearly all science-related prompts and the majority of geography-oriented questions up until they were rated roughly 70 on Schellaert’s difficulty scale. Addition was more problematic, with the frequency of correct answers falling dramatically after the difficulty rose above 40. “Even for the best models, the GPTs, the failure rate on the most difficult addition questions is over 90 percent. Ideally we would hope to see some avoidance here, right?” says Schellaert. But we didn’t see much avoidance.



Source link

━ more like this

Instagram adds a watch history for Reels

Instagram's latest feature should make it easier to resurface videos you've viewed. On Friday, Adam Mosseri revealed a new watch history for Reels....

Rivian agrees to settle shareholder lawsuit for $250 million

Rivian has agreed to settle a 2022 shareholder lawsuit. The automaker will pay out $250 million to qualifying investors if the agreement is...

Big tech is helping to pay for Trump’s ballroom that we all definitely want

The federal government has released a list of all of the entities helping to pay for President Trump's lavish White House ballroom, ....
spot_img