ChatGPT Stole Your Work. So What Are You Going to Do?

Date:

Share:


If you’ve ever uploaded photos or art, written a review, “liked” content, answered a question on Reddit, contributed to open source code, or done any number of other activities online, you’ve done free work for tech companies, because downloading all this content from the web is how their AI systems learn about the world.

Tech companies know this, but they mask your contributions to their products with technical terms like “training data,” “unsupervised learning,” and “data exhaust” (and, of course, impenetrable “Terms of Use” documents). In fact, much of the innovation in AI over the past few years has been in ways to use more and more of your content for free. This is true for search engines like Google, social media sites like Instagram, AI research startups like OpenAI, and many other providers of intelligent technologies. 

This exploitative dynamic is particularly damaging when it comes to the new wave of generative AI programs like Dall-E and ChatGPT. Without your content, ChatGPT and all of its ilk simply would not exist. Many AI researchers think that your content is actually more important than what computer scientists are doing. Yet these intelligent technologies that exploit your labor are the very same technologies that are threatening to put you out of a job. It’s as if the AI system were going into your factory and stealing your machine. 

But this dynamic also means that the users who generate data have a lot of power. Discussions over the use of sophisticated AI technologies often come from a place of powerlessness and the stance that AI companies will do what they want, and there’s little the public can do to shift the technology in a different direction. We are AI researchers, and our research suggests the public has a tremendous amount of “data leverage” that can be used to create an AI ecosystem that both generates amazing new technologies and shares the benefits of those technologies fairly with the people who created them. 

Data leverage can be deployed through at least four avenues: direct action (for instance, individuals banding together to withhold, “poison,” or redirect data), regulatory action (for instance, pushing for data protection policy and legal recognition of “data coalitions”), legal action (for instance, communities adopting new data-licensing regimes or pursuing a lawsuit), and market action (for instance, demanding large language models be trained only with data from consenting creators). 

Let’s start with direct action, which is a particularly exciting route because it can be done immediately. Because of generative AI systems’ reliance on web scraping, website owners could significantly disrupt the training data pipeline if they disallow or limit scraping by configuring their robots.txt file (a file that tells web crawlers which pages are off limit).

Large user-generated content sites like Wikipedia, StackOverflow, and Reddit are particularly important to generative AI systems, and they could prevent these systems from accessing their content in even stronger ways—for example, by blocking IP traffic and API access. According to Elon Musk, Twitter has recently done exactly this. Content producers should also take advantage of the opt-out mechanisms that are increasingly being provided by AI companies. For instance, programmers on GitHub can opt out of BigCode’s training data via a simple form. More generally, simply being vocal when content has been used without your consent has been somewhat effective. For example, major generative AI player Stability AI agreed to honor opt-out requests collected via haveibeentrained.com after a social media uproar. By engaging in public forms of action, as in the case of mass protest against AI art by artists, it may be possible to force companies to cease business activities that most of the public perceives as theft.



Source link

━ more like this

How to watch the Hisense CES 2026 presentation live

Hisense is perhaps best known for its budget-friendly electronics and appliances, like TVs and refrigerators. But at CES 2025, the China-based company showed...

Samsung unveils its new $200 Galaxy A17 5G smartphone, arriving in January

Samsung will have two new inexpensive mobile devices arriving on the US market next month. The Galaxy A17 5G starts at $199 and...

What if the Apple Watch looked like an iMac G3? This concept nails it

Apple‘s late-90s design era refuses to stay in the past, and a new Apple Watch concept inspired by the iMac G3 shows why...

2026 makes way for faster laptops, but at the cost of memory

CES (Consumer Electronics Show) has long served as a key venue for the introduction of new laptops. It also plays an important role...

Netflix has released a trailer for the Stranger Things finale

Tomorrow's the big day, and I don't just mean New Year's Eve. The series finale of Stranger Things airs tomorrow, and Netflix has...
spot_img