AI & Copyright
The Great Legal Divide
Section 2: The Core Conflict
With AI authorship settled, the legal battleground has shifted to the data used to train AI models. The central issue is the unauthorized scraping of billions of copyrighted works from the internet.
🎨 Artists’ Rebellion: Andersen v. Stability AI
A class-action lawsuit by visual artists Sarah Andersen, Kelly McKernan, and Karla Ortiz, alleging that their copyrighted images were used without consent to train image generators.
Key Allegation:
The models were trained to mimic their unique artistic styles, creating a competing product that devalues their work and siphons potential commissions.
🏢 Corporate Stand: Getty Images v. Stability AI
The stock photo giant sued Stability AI for scraping over 12,000,000 of its images to train the Stable Diffusion model.
The “Smoking Gun”:
The AI sometimes reproduces a distorted but recognizable version of the Getty Images watermark, providing powerful visual evidence of direct copying.
Section 3: The “Fair Use” Battleground
AI companies defend their data scraping under the “fair use” doctrine, a legal concept allowing limited use of copyrighted material. Courts must weigh four factors, leading to conflicting rulings and profound legal uncertainty.
The Four Factors of Fair Use
1. Purpose & Character
Is the use transformative or just a substitute?
2. Nature of the Work
Was the original work creative or factual?
3. Amount Used
How much of the original work was taken?
4. Market Effect
Does the use harm the market for the original?
Conflicting Rulings: A Judicial Split
Different courts have weighed the four factors differently, creating a fractured legal landscape. This chart compares the outcomes, showing whether the ruling favored the AI company (transformative process) or the creator (market harm).
Section 4: The Path Forward
As legal battles rage, a consensus is emerging around ethical practices. The immense legal risk of using scraped data is creating a powerful business case for a more sustainable, consent-based approach.
Unethical Practices
- ❌Scraping copyrighted data without permission, credit, or compensation.
- ❌Mimicking living artists’ styles to create competing derivative works.
- ❌Removing watermarks and other copyright management information.
Emerging Ethical Practices
- ✅Licensing data from creators and rights holders directly.
- ✅Implementing Opt-In/Opt-Out systems to respect creator choice.
- ✅Providing data provenance and transparency about training sets.
Creator Preference: Consent Models
The creative community overwhelmingly favors “Opt-In” systems, where AI companies must get permission before using a work, shifting the burden from creators to corporations.