in

Piracy lawsuit against Meta could set precedent for torrenting copyrighted works in AI training


A hot potato: Meta is embroiled in a ground-breaking AI lawsuit that could change how courts view copyright law. The case seems open-and-shut from the plaintiffs’ view. However, if a judge sees otherwise, it could set a monumental precedent allowing corporations to pirate copyrighted material to train AI systems.

In January 2024, a group of writers filed a lawsuit in California against Meta for using their works to train various versions of the Llama large language model. Meta openly admitted to using the Book3 dataset, a well-known 37GB compilation of 195,000 copyrighted books used by developers to train LLMs since 2020. The company defends its actions, citing the Fair Use doctrine. Earlier this year, the court unsealed documents Showing that Meta had used torrenting to gather its AI training data.

On Monday, the authors filed for a partial summary judgment in a California U.S. District Court, arguing that Meta’s alleged use of pirated data leaves no room for legal ambiguity. The plaintiffs claim Meta’s use of torrenting to acquire copyrighted books for artificial intelligence training amounts to clear-cut copyright infringement.

“Whatever the merits of generative artificial intelligence, or GenAI, stealing copyrighted works off the Internet for one’s own benefit has always been unlawful,” the authors stated in their filing.

According to the unsealed documents, Meta initially attempted to download pirated books individually, but this process was too slow and placed excessive strain on its networks. The company then allegedly turned to torrenting – an infamous file-sharing method long associated with copyright infringement – to acquire terabytes of copyrighted books in bulk far beyond the scope of the Books3 dataset.

The authors claim that Meta was fully aware of the legal risks involved and took deliberate action to obscure its activities. The company allegedly ran the torrent client through Amazon Web Services rather than Meta’s infrastructure – an action that is not standard practice for the social media giant.

The heavily redacted motion, obtained by Ars Technica, points out that torrent users typically download (leech) and upload (seed) chunks of a file to allow faster downloads. Leeching and seeding are widely considered illegal if the files contain copyrighted material. Furthermore, by seeding a torrent, Meta may have actively facilitated piracy by distributing copyrighted books.

The plaintiffs feel that a trial is no longer necessary and seek immediate judgment. The authors contend that the company’s actions clearly violate copyright law, falling far outside Meta’s fair-use defense. A decision in Meta’s favor could set a dangerous precedent going far beyond books, allowing AI developers to infringe on copyrights without compensating the IP owners.

“[The court] should nevertheless grant summary judgment under the four fair use factors regarding Meta’s decision to make available to other P2P pirates millions of copyrighted books in exchange for faster download speed,” the motion argues.

While it seems like a relatively open-and-shut case, presiding judge Vince Chhabria admitted that he was unfamiliar with torrenting and related terminology like seeding and leeching. For this reason, Judge Chhabria may deny the motion for summary judgment, choosing to hear experts testify and explain the case so that he can make a fair and honest ruling.

The final decision in the lawsuit will be ground-breaking no matter which way it goes. If Meta prevails, it opens the door for other AI developers to pirate books, images, or videos to train their models. If the authors win, it sets a precedence for similar cases, including those currently in the judicial system. It could also lead to further copyright reform akin to the Digital Millennium Copyright Act.



Source link

Time running out for Liverpool to make themselves serial winners | Liverpool

First operating system for quantum networks