Gracenote, owned by Nielsen, sues OpenAI for alleged infringement in grabbing media metadata

Gracenote, a metadata and content identification service company owned by Nielsen, has filed a lawsuit against OpenAI in the U.S. Federal Court for the Southern District of New York, accusing the artificial intelligence company of crawling and using its media metadata database and unique data association framework on a large scale without authorization and without paying any fees, for training large language models that support commercial products such as ChatGPT, constituting serious copyright infringement and endangering its core business.

Gracenote stated in the complaint that it has relied on hundreds of editors over the years to manually edit and annotate film, television, music, and sports content around the world, and has established a "program database" that includes program introductions, video feature descriptions, unique content identifiers, and complex relationship graphs, and has completed registration with the U.S. Copyright Office. The company believes that this database not only contains specific text content, but also includes a proprietary structural design to classify, associate and organize different works. This "relationship framework" is an important source of value for its services to enterprise customers such as streaming media platforms and smart TV manufacturers.

The complaint states that OpenAI crawled and assimilated the above data without permission, and when users asked questions through ChatGPT, it output a description that was highly similar or even completely consistent with the Gracenote program introduction in a near verbatim manner. Examples provided by Gracenote include when a user asked ChatGPT to describe the popular TV series Game of Thrones, and the model came up with almost identical content to the version written by Gracenote editors. The company also said that multiple versions of ChatGPT were able to recite large chunks of program descriptions in its database with very few prompt words, indicating that the relevant text and its underlying organizational structure had been directly copied and embedded into the model.

Gracenote proposed that OpenAI's unauthorized use of its metadata and relational framework not only infringed on copyrighted text and database structures, but also provided media content distributors and equipment manufacturers with the possibility to build alternative metadata services based on "free crawled data", thus directly weakening the market competitiveness of Gracenote's similar products. The complaint warns that if such behavior cannot be stopped and remedied, terminal manufacturers such as smart TVs can rely on data "reversely derived" from AI models to build their own metadata platforms that compete with Gracenote without having to pay any licensing fees.

In terms of claims, Gracenote relies on the fact that its database has been registered with the U.S. Copyright Office, and in addition to seeking compensation for actual losses, it also seeks statutory damages to deal with what it claims is ongoing and large-scale infringement. The so-called statutory damages refer to a fixed or range amount predetermined by law for specific types of copyright infringement, while actual damages are used to compensate the right holder for the actual economic losses suffered due to the infringement.

In response to an interview with Axios, an OpenAI spokesperson said its models "enable innovation" and are trained on "publicly available data" and backed by "fair use." Many AI companies, including OpenAI, have consistently argued that training models by crawling public Internet content is consistent with the determination of fair use under current U.S. copyright law, on the grounds that these data can provide users with new and useful services and information after being transformed by the model.

Another reason why Gracenote’s lawsuit is attracting attention is that the company has always been open to cooperation with AI companies and has reached multiple AI-related data licensing agreements with Samsung, Google and other companies. Gracenote stated in the complaint that it contacted OpenAI many times to discuss licensing matters, but was "repeatedly rejected or ignored for a long period of time" and therefore had to resort to litigation to protect its rights and interests. The company's CEO Jared Grusd emphasized in a statement that "Supporting the development of AI and opposing theft are not inconsistent. They are the only path to sustainable development of the industry," saying that the lawsuit aims to protect this future.

Legal professionals believe that with multiple copyright disputes between media and information companies and AI companies awaiting court rulings, this case is likely to become an important reference for judges to examine whether "non-traditional works" such as database structures and metadata association maps can obtain copyright protection and how to determine the "boundary of fair use of large models." Gracenote emphasized in its complaint that much of the content output by OpenAI is "nearly identical" to the metadata it licensed to its customers. Therefore, it does not derive new information, but is a substantial copy of existing content. This will become one of the key points of dispute that distinguishes this case from other AI copyright cases.