Technology
OpenAI accidentally deleted potential evidence in NY Times copyright lawsuit (update)
Lawyers for The New York Times and Daily News, who’re suing OpenAI for allegedly copying their work to coach artificial intelligence models without permission, say OpenAI engineers accidentally deleted potentially relevant data.
Earlier this fall, OpenAI agreed to offer two virtual machines in order that advisors to The Times and Daily News could seek for copyrighted content in their AI training kits. (Virtual machines are software-based computers that exist inside one other computer’s operating system and are sometimes used for testing purposes, backing up data, and running applications.) letterlawyers for the publishers say they and the experts they hired have spent greater than 150 hours since November 1 combing through OpenAI training data.
However, on November 14, OpenAI engineers deleted all publisher search data stored on one among the virtual machines, in keeping with the above-mentioned letter, which was filed late Wednesday in the U.S. District Court for the Southern District of New York.
OpenAI tried to get better the information – and was mostly successful. However, since the folder structure and filenames were “irretrievably” lost, the recovered data “cannot be used to determine where the news authors’ copied articles were used to build the (OpenAI) models,” the letter says.
“The news plaintiffs were forced to recreate their work from scratch, using significant man-hours and computer processing time,” lawyers for The Times and the Daily News wrote. “The plaintiffs of the news learned only yesterday that the recovered data was useless and that the work of experts and lawyers, which took a whole week, had to be repeated, which is why this supplementary letter is being filed today.”
The plaintiffs’ attorney explains that they don’t have any reason to consider the removal was intentional. However, they are saying the incident highlights that OpenAI “is in the best position to search its own datasets” for potentially infringing content using its own tools.
An OpenAI spokesman declined to make an announcement.
However, late Friday, November 22, OpenAI’s lawyer filed a motion answer to a letter sent Wednesday by attorneys to The Times and Daily News. In their response, OpenAI’s lawyers unequivocally denied that OpenAI had deleted any evidence and as a substitute suggested that the plaintiffs were guilty for a system misconfiguration that led to the technical problem.
“Plaintiffs requested that one of several machines provided by OpenAI be reconfigured to search training datasets,” OpenAI’s attorney wrote. “Implementation of plaintiffs’ requested change, however, resulted in the deletion of the folder structure and certain file names from one hard drive – a drive that was intended to serve as a temporary cache… In any event, there is no reason to believe that any files were actually lost.”
In this and other cases, OpenAI maintains that training models using publicly available data – including articles from The Times and Daily News – are permissible. In other words, by creating models like GPT-4o that “learn” from billions of examples of e-books, essays, and other materials to generate human-sounding text, OpenAI believes there isn’t a licensing or other payment required for examples – even when he makes money from these models.
With this in mind, OpenAI has signed licensing agreements with a growing number of recent publishers, including the Associated Press, Business Insider owner Axel Springer, the Financial Times, People’s parent company Dotdash Meredith and News Corp. OpenAI declined to offer the terms of those agreements. offers are public, but one among its content partners, Dotdash, is apparently earns at the least $16 million a 12 months.
OpenAI has not confirmed or denied that it has trained its AI systems on any copyrighted works without permission.