One of the most common questions we hear from creators is: how do I know if my work was used to train an AI model? The answer is not always straightforward, but there are steps you can take to find out.
The most well-known training datasets include:
Some datasets are publicly documented and searchable. Others are proprietary and undisclosed.
The LAION dataset is indexed and searchable through community tools like haveibeentrained.com. Upload or paste a sample of your work and the tool will check whether it appears in the dataset. Results vary — not everything in the dataset will be found — but it's a starting point.
Some platforms now provide information about whether your content has been used for training. Check the terms of service and privacy policies for any platform where you've published work. Notable developments:
Another approach: test whether an AI model can reproduce or closely mimic your work. If you ask an image generator to produce something in your style and it produces a result that looks like your actual work, that's a strong signal.
Absence of evidence is not evidence of absence. Many training datasets are proprietary and not publicly searchable. Even if you can't find your work in a specific dataset, it may still have been used. Documentation of your publication timeline — when your work was published, where it appeared, and how widely it was distributed — can help establish that it was available for scraping.
Whether you find your work or not, document everything. See our Documenting Infringement guide for a complete checklist.
Was this helpful? Contact us if you have questions about your specific situation.
Contact us