sb.scorecardresearch

Published 17:19 IST, January 12th 2025

Did Mark Zuckerberg Approve Meta To Use Pirated Content for AI Training?

Meta’s counsel has argued that the company omitted the copyrighted materials from publishers such as McGraw Hill from LibGen before using it to train its AI.

Reported by: Tech Desk
Follow: Google News Icon
  • share
Mark Zuckerberg
Mark Zuckerberg has been accused of greenlighting Meta to train its AI model on pirated content. | Image: AP

Mark Zuckerberg faces a new allegation in a new copyright lawsuit that he approved Meta to train its Llama AI model on pirated content and materials. In simple terms, Meta knowingly used unauthorised materials to train its AI models. The pirated materials include academic and general-interest books, journals, and images, which Meta proceeded to feed to Llama despite concerns by company executives and employees.

According to TechCrunch, documents submitted to the court in the Kadrey v. Meta case highlight that the Zuckerberg-led company used the LibGen dataset for AI training. LibGen, also known as “shadow library,” offers file-sharing access to a wide catalogue, including copyrighted works, and has previously been sued multiple times, fined millions of dollars, and even ordered to wind up. The plaintiffs of the lawsuit have accused Zuckerberg of using those copyrighted works to train Llama.

However, Meta’s counsel has argued that the company omitted the copyrighted materials from publishers such as Cengage Learning, Macmillan Learning, and McGraw Hill from LibGen before using it to train its AI. In the submitted document, it said that the company “remov[ed] all the copyright paragraphs from [the] beginning and the end” of scientific journal articles. One of Meta’s engineers also made a code to automatically remove copyrighted information from LibGen.

The counsel has, however, said that Meta did so to hide its act of copyright infringement from the public. “This discovery suggests that Meta strips [copyright information] not just for training purposes,” reads the submitted document, “but also to conceal its copyright infringement, because stripping copyrighted works … prevents Llama from outputting copyright information that might alert Llama users and the public to Meta’s infringement.” It also mentioned Meta’s admission of torrenting LibGen materials even though some engineers were reluctant to do that from their corporate laptops.

The case is still underway, with the judge hearing both sides.
 

Updated 17:19 IST, January 12th 2025