A group of prominent authors—including Daniel Okrent, Jia Tolentino, and Kai Bird—filed a lawsuit this week against Microsoft, accusing the tech giant of using pirated versions of nearly 200,000 books to train one of its AI models.
The lawsuit, filed Tuesday (24) in a federal court in New York, claims that Microsoft used unauthorized copies of their works to help develop Megatron, a large language model created in partnership with Nvidia. The writers are seeking an injunction to stop the alleged misuse and are demanding up to $150,000 in damages per copyrighted work.
The complaint centers around the use of pirated material for AI training, which the authors say was done without their consent or compensation. They argue that Microsoft gained a commercial advantage by using their content illegally to train its language model.
Megatron is designed to process and generate human-like text by analyzing vast amounts of written material. The model relies heavily on massive datasets, which the authors allege included their copyrighted works sourced from illegal copies.
This legal action follows closely on the heels of a significant court ruling in California, where a judge concluded that Anthropic—the company behind the Claude AI—could rely on “fair use” to justify using copyrighted content for training purposes. However, that ruling doesn’t give companies free rein: they can still be held accountable for using pirated material, even if fair use applies in certain contexts.
In the United States, fair use allows limited use of copyrighted material without the creator’s permission for purposes like criticism, reporting, education, and research, as long as it’s not for direct profit. Whether training AI models falls under that umbrella remains a hot legal topic.
Microsoft has yet to respond publicly to the lawsuit.
Meta Wins Lawsuit Over AI Training With Copyrighted Books
Meta, the company led by Mark Zuckerberg, scored a legal victory on Wednesday (25) in a case that mirrors recent legal troubles faced by Microsoft.
Thirteen authors had filed a lawsuit against Meta, accusing the company of using their books without permission to train its artificial intelligence language models. The writers argued that their copyrighted works were taken without consent and used for commercial purposes.
However, the judge presiding over the case ruled in Meta’s favor, stating that the company’s actions fell under the legal principle of fair use. According to the ruling, there was no copyright infringement, and Meta’s use of the content was permissible under U.S. law.
This decision comes at a time when the legal boundaries around using copyrighted material to train AI models are under intense scrutiny. While the ruling strengthens Meta’s position, it doesn’t necessarily set a blanket precedent for similar cases, especially when issues like piracy or direct commercial exploitation are involved.
As the debate around fair use and AI continues to evolve, this case adds another layer to how courts are interpreting the law in the age of machine learning.