Publishers are blocking the Internet Archive for fear AI scrapers can use it as a workaround

Date:

Share:


The Internet Archive has often been a valuable resource for journalists, from it’s finding records of deleted tweets or providing academic texts for background research. However, the advent of AI has created a new tension between the parties. A few major publications have begun blocking the nonprofit digital library’s access to their content based on concerns that AI companies’ bots are using the Internet Archive’s collections to indirectly scrape their articles.

“A lot of these AI businesses are looking for readily available, structured databases of content,” Robert Hahn, head of business affairs and licensing for The Guardian, told Nieman Lab. “The Internet Archive’s API would have been an obvious place to plug their own machines into and suck out the IP.”

The New York Times took a similar step. “We are blocking the Internet Archive’s bot from accessing the Times because the Wayback Machine provides unfettered access to Times content — including by AI companies — without authorization,” a representative from the newspaper confirmed to Nieman Lab. Subscription-focused publication the Financial Times and social forum Reddit have also made moves to selectively block how the Internet Archive catalogs their material.

Many publishers have attempted to sue AI businesses for how they access content used to train large language models. To name a few just from the realm of journalism:

  • The New York Times sued OpenAI and Microsoft

  • The Center for Investigative Reporting sued OpenAI and Microsoft

  • The Wall Street Journal and New York Post sued Perplexity

  • A group of publishers including The Atlantic, The Guardian and Politico sued Cohere

  • The New York Times and the Chicago Tribune sued Perplexity

Other media outlets have sought financial deals before offering up their libraries as training material, although those arrangements seem to provide compensation to the publishing companies rather than the writers. And that’s not even delving into the copyright and piracy issues also being fought against AI tools by other creative fields, from fiction writers to visual artists to musicians. The whole Nieman Lab story is well worth a read for anyone who has been following any of these creative industries’ responses to artificial intelligence.



Source link

━ more like this

Agentic AI in Retail 2026: The Playbook for Scalable Impact – Insights Success

For brands and retailers, success is not just about executing assortments or managing seasonal demand. It’s about making the correct decisions quicker and...

NASA animation shows exactly how its crewed moon mission will unfold

A NASA video (above) reveals in great detail how its upcoming Artemis II mission is expected to play out. The space agency released the...

Apple just reported its best-ever quarter for iPhone sales

Apple shared its latest quarterly financial results today and the news is once again very, very good for the Cupertino company. The quarter...
spot_img