Guide

What is AI Training Data?

Beaumont & Sheridan — Information resource for individual creators

Artificial intelligence models, particularly large language models and image generators, are trained on vast amounts of data. That data includes text, images, audio, and video — much of it scraped from the public internet. And much of it was created by individual creators who never consented to its use.

How Training Data Is Collected

AI companies collect training data through automated web scraping — software that crawls the internet and downloads content. Common sources include:

The scale is enormous. Major training datasets like LAION-5B contain billions of images. The content is downloaded, labeled, and fed into machine learning models that learn patterns, styles, and structures — then generate new content based on what they learned.

The key question

When a creator's work is included in a training dataset without permission, is that copyright infringement? The courts are currently deciding this. Several major class-action lawsuits argue that it is — that AI companies are building commercial products on the backs of creators without paying for the raw material.

Why It Matters for Creators

If your work has been included in a training dataset without your knowledge or consent, several of your rights may have been violated:

What to Do If Your Work Was Used

If you believe your work has been included in an AI training dataset without your permission:

  1. Document everything. Save screenshots, URLs, and any evidence of your work appearing in training data.
  2. Check dataset listings. Some datasets are publicly documented. Search for your work in the LAION dataset or others.
  3. File opt-out requests. Some platforms allow creators to opt out of future training data collection.
  4. Consider legal action. Class actions are currently accepting creators in several categories.

See our Documenting Infringement guide for a complete checklist of what to save.

Was this helpful? Contact us if you have questions about your specific situation.

Contact us