The New York Times Sues OpenAI and Microsoft, Alleging “Millions” of Its Articles Were Used for Training Chatbots: “Defendants Seek to Free-R...

Tsing

The FPS Review
Staff member
Joined
May 6, 2019
Messages
12,871
Points
113
The New York Times has revealed that it's suing OpenAI and Microsoft as part of a new lawsuit that alleges the companies have been using "millions" of its articles to train their chatbots, something that the publication isn't too happy about for reasons that include the trending technology's potential for harming the journalism business.

See full article...
 
Ah, too bad they used NY times, you know garbage in garbage out and all that jazz. Open AI should sue NY Times for harming their AI algorithm with their garbage and hubris.
 
By the same you can sue anyone referencing your articles in a research paper. Or even just for reading them and later relying on information learnt from it.
 
Headline from the future: NY times lays off 80% of "journalists" after deploying newsroom AI, a new news writing technology developed by Open AI
 
By the same you can sue anyone referencing your articles in a research paper.
Technically, you are supposed to get permission before citing a copyrighted source, even in a non-profit or academic setting. An author or copyright holder can sue a researcher for copyright violations if they find fault with the use.

There is a "Fair Use" clause in US copyright law, and that allows "limited" use of copyrighted source, but there's no strict definition on what "Limited" means - just whatever you can justify in front of a judge and convince them of, if you get called to task.

Generally, unless someone is making money of someone else's copyright (which is the case here), or someone is using Fair Use to tarnish a brand or reputation - there's no real money in going after infringement, so you don't see a lot of enforcement. Unless you are Nintendo, then you just go after everyone.

If the NYT can prove that ChatGPT was intentionally and specifically trained using their material, or if they can catch ChatGPT using exact quotes or citations from NYT material without permission, they have a strong case and would probably at least settle something, if not outright win.

If it's just a case of ChatGPT scraping everything up in the web, and NYT had a lot of stuff hanging out there in the public domain - that's a less strong case, but has much wider implications. There are a lot of copyright holders out there with stuff hanging out, and it goes to show if scraping for commercial use is covered under Fair Use. It could impact non-AI things like search engines, which rely (in part) on crawling and scraping to generate metadata and indexes.
 
Last edited:
Technically, you are supposed to get permission before citing a copyrighted source, even in a non-profit or academic setting. An author or copyright holder can sue a researcher for copyright violations if they find fault with the use.
Imagine that you cite a famous author in your school paper, how on god green's earth would you get in contact with a world famous author to get their permission? And what do you mean by "fault with the use"? I doubt copyright law would use this vague terminology.
There is a "Fair Use" clause in US copyright law, and that allows "limited" use of copyrighted source, but there's no strict definition on what "Limited" means - just whatever you can justify in front of a judge and convince them of, if you get called to task.
It is actually not that hard to prove fair use. If your work is transformative. And I doubt there is a more transformative use than generative AI.
Generally, unless someone is making money of someone else's copyright (which is the case here), or someone is using Fair Use to tarnish a brand or reputation - there's no real money in going after infringement, so you don't see a lot of enforcement. Unless you are Nintendo, then you just go after everyone.
For Nintendo it is not about fair use they go after anyone using their IP. Using an article in research or training is very different than releasing a supermario product.
If the NYT can prove that ChatGPT was intentionally and specifically trained using their material, or if they can catch ChatGPT using exact quotes or citations from NYT material without permission, they have a strong case and would probably at least settle something, if not outright win.
Of course it was intentionally trained on it. How else? By accident? But not specifically only on that. They don't have a case even if someone trains a generative ai exclusively on NT articles, because the end product does not contain any of them. It's as if a teacher shows NYT articles to budding journalist in class then the NYT sues him for training them with their work.
If it's just a case of ChatGPT scraping everything up in the web, and NYT had a lot of stuff hanging out there in the public domain - that's a less strong case, but has much wider implications. There are a lot of copyright holders out there with stuff hanging out, and it goes to show if scraping for commercial use is covered under Fair Use. It could impact non-AI things like search engines, which rely (in part) on crawling and scraping to generate metadata and indexes.
They'd have a case if the generative model itself contained their copyrighted work, but it doesn't. If it was enough to sue someone for merely using copyrighted work as reference, then every journalist could be sued for reading other articles while researching for their own.
 
Imagine that you cite a famous author in your school paper, how on god green's earth would you get in contact with a world famous author to get their permission?
Contact their management or publisher I would assume
 
Become a Patron!
Back
Top