The NY Times Lawsuit Against OpenAI Would Open Up The NY Times To All Sorts Of Lawsuits Should It Win

In the end, though, the crux of this lawsuit is the same as all the others. It’s a false belief that reading something (whether by human or machine) somehow implicates copyright. This is false. If the courts (or the legislature) decide otherwise, it would upset pretty much all of the history of copyright and create some significant real world problems.

Part of the Times complaint is that OpenAI’s GPT LLM was trained in part with Common Crawl data. Common Crawl is an incredibly useful and important resource that apparently is now coming under attack. It has been building an open repository of the web for people to use, not unlike the Internet Archive, but with a focus on making it accessible to researchers and innovators. Common Crawl is a fantastic resource run by some great people (though the lawsuit here attacks them).

But, again, this is the nature of the internet. It’s why things like Google’s cache and the Internet Archive’s Wayback Machine are so important. These are archives of history that are incredibly important, and have historically been protected by fair use, which the Times is now threatening.

(Notably, just recently, the NY Times was able to get all of its articles excluded from Common Crawl. Otherwise I imagine that they would be a defendant in this case as well).

Either way, so much of the lawsuit is claiming that GPT learning from this data is infringement. And, as we’ve noted repeatedly, reading/processing data is not a right limited by copyright. We’ve already seen this in multiple lawsuits, but this rush of plaintiffs is hoping that maybe judges will be wowed by this newfangled “generative AI” technology into ignoring the basics of copyright law and pretending that there are now rights that simply do not exist.

Image

Image alternative text

LemmyIsFantastic, 4 months ago

Holy Christ, this. This is what people are missing. All of these suits bring bright up against AI boil down to this and unless the law changes (not implying either way) these suits are dumb.

Me using your public works and deriving my own, machine helped or not, has never been protected.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

FireTower, 4 months ago

The author in my opinion misrepresents the stance of the NY Times here.

It’s a false belief that reading something (whether by human or machine) somehow implicates copyright.

The Times issue isn’t just that someone or thing is reading materials. The Times takes issue with a group intentionally enmass collecting large amounts of their data (in their case articles) with the intention of distributing them packed into a product to 3rd parties engaging in commercial activities without paying a licensing fee. The Times fears that them doing this damages the potential market for future and past articles from them.

In essentially the Times fears that Common Crawl is acting a fence for other groups to infringe on their copyrighted works.

Factors of Fair Use:

The purpose and character of the use, including whether such use is of a commercial nature or is for nonprofit educational purposes.

The nature of the copyrighted work.

The amount and substantiality of the portion used in relation to the copyrighted work as a whole.

The effect of the use upon the potential market for or value of the copyrighted work.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

PastaGorgonzola, 4 months ago

I see what you mean, but I thought copyright is a protection against copying something (even with some modifications).

Techdirt traditionally has a very clear view on copyright and its restrictions, so I am familiar with their bias. Their argument here boils down to the difference between copying something and learning from something. If reading something and learning from it is copyright infringement, any educational institute should be very worried. Because that’s exactly what’s going on in there.

I do understand the difference between a student reading dozens/hundreds of NYT articles (for free in the library) and a computer program doing the same, but for orders of magnitude more articles. So I’m curious to see how this is going to turn out

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

littlebluespark, 4 months ago

The Times articles are not “packed into a product”, FFS. How is this so hard for people to grasp? The simple act of parsing data changes it. If digesting media is theft, then every single meme is piracy, and every person who’s ever been to a museum, watched a play, or a movie, or read a book, is guilty of “stealing” copyrighted material every single time they’ve done so.

This is genuinely mind-boggling how do many find this basic, crystal clear concept so fucking challenging to grasp.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

Add comment