Study: Meta AI model can reproduce almost half of Harry Potter book

Date:

Share:



The Google Books precedent probably can’t protect Meta against this second legal theory because Google never made its books database available for users to download—Google almost certainly would have lost the case if it had done that.

In principle, Meta could still convince a judge that copying 42 percent of Harry Potter was allowed under the flexible, judge-made doctrine of fair use. But it would be an uphill battle.

“The fair use analysis you’ve gotta do is not just ‘is the training set fair use,’ but ‘is the incorporation in the model fair use?’” Lemley said. “That complicates the defendants’ story.”

Grimmelmann also said there’s a danger that this research could put open-weight models in greater legal jeopardy than closed-weight ones. The Cornell and Stanford researchers could only do their work because the authors had access to the underlying model—and hence to the token probability values that allowed efficient calculation of probabilities for sequences of tokens.

Most leading labs, including OpenAI, Anthropic, and Google, have increasingly restricted access to these so-called logits, making it more difficult to study these models.

Moreover, if a company keeps model weights on its own servers, it can use filters to try to prevent infringing output from reaching the outside world. So even if the underlying OpenAI, Anthropic, and Google models have memorized copyrighted works in the same way as Llama 3.1 70B, it might be difficult for anyone outside the company to prove it.

Moreover, this kind of filtering makes it easier for companies with closed-weight models to invoke the Google Books precedent. In short, copyright law might create a strong disincentive for companies to release open-weight models.

“It’s kind of perverse,” Mark Lemley told me. “I don’t like that outcome.”

On the other hand, judges might conclude that it would be bad to effectively punish companies for publishing open-weight models.

“There’s a degree to which being open and sharing weights is a kind of public service,” Grimmelmann told me. “I could honestly see judges being less skeptical of Meta and others who provide open-weight models.”

Timothy B. Lee was on staff at Ars Technica from 2017 to 2021. Today, he writes Understanding AI, a newsletter that explores how AI works and how it’s changing our world. You can subscribe here.



Source link

━ more like this

OpenAI completes corporate reorganization with support from Microsoft

OpenAI has completed its long, drawn-out reorganization into a public benefit corporation, the company announced today in a blog post attributed to board...

More than 300,000 self-employed taxpayers could face fines if they miss a key HMRC deadline – London Business News | Londonlovesbusiness.com

Taxpayers submitting a paper Self Assessment return must do so by October 31. Although digital submissions are far more common, government figures show that...

Equities shine as gold tumbles – London Business News | Londonlovesbusiness.com

Equity markets were in ebullient mood last week with global equities gaining 1.9% and 2.5% in local currency and sterling terms respectively. China and...

America’s Sovereign AI supercomputers will use AMD chips

AMD is working with the US Department of Energy to build sovereign AI supercomputers at Oak Ridge National Laboratory, the agency's famous research...

Reeves Budget could lead to ‘store closures’ and the loss of tens of thousands of jobs – London Business News | Londonlovesbusiness.com

The Chancellor has been warned by hospitality groups and major retailers that Rachel Reeve’s planned business rates could put 120,000 jobs at risk...
spot_img