On December 24, 2025, The Decoder summarized a new U.S. lawsuit in which John Carreyrou and five other authors accuse OpenAI, Anthropic, Google, Meta, xAI and Perplexity of training AI models on pirated copies of their books. The complaint, filed December 22 in California federal court, seeks up to $150,000 per work for alleged willful infringement via “shadow libraries” like LibGen and Z-Library.
This article aggregates reporting from 8 news sources. The TL;DR is AI-generated from original reporting. Race to AGI's analysis provides editorial context on implications for AGI development.
This lawsuit is another sign that copyright friction is becoming a structural constraint on frontier AI, not just background noise. By explicitly targeting training on pirated books and rejecting the class-action route, Carreyrou and co-plaintiffs are trying to raise the effective price of high-quality text data for leading labs like OpenAI, Anthropic, Google, Meta, xAI and Perplexity. If courts agree that using shadow libraries is a separate, willful infringement, statutory damages could become material enough to alter training data strategies.([the-decoder.com](https://the-decoder.com/authors-sue-six-ai-giants-for-book-piracy/))
In practical terms, this accelerates the shift toward licensed corpora, synthetic data, and first-party content deals like Disney–OpenAI. It also raises the bar for upstart labs that don’t have deep pockets or strong licensing relationships, potentially entrenching incumbents that can afford to pay. At the same time, a clear legal line against pirated sources may actually de-risk large-scale training, because companies will know what’s off-limits and can plan settlements or licenses accordingly.
For the race to AGI, the likely impact is mixed: higher costs and legal exposure can slow unrestrained scaling, but they may also push big players toward more sustainable, rights-respecting data pipelines that regulators are more willing to tolerate at AGI-adjacent capability levels.


