Anyone know if any of these T&Cs about not training new models on the output... - Random

simon, 7 months ago

Anyone know if any of these T&Cs about not training new models on the output been explained publicly by expert lawyers, or tested in court yet?

Feels like a huge elephant in the room, especially given the vast number of models fine-tuned on generated data

Yet another part of the AI space that seems to be running on "vibes" - I'm not sure "OpenAI train on crawled data so they can't complain if we train on their stuff" is a robust legal argument, but maybe it is?

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ bornach

Image

Image alternative text

alexhudson, 7 months ago

@simon I feel like those clauses directly cut across "you can use our output for your own IP" scenarios - eg Copilot, Dall E, etc. I don't know how you can claim there's nothing subsisting from training in the output, and no IP, and yet claim to control the output in that way?

I also feel like current law is just completely maladjusted to this. I think there's going to be some Berne like agreement to cover AI input and output, and it may not look/work like current law at all. Going back to the social compact of IP, I suspect lobbying about supposed benefits will be intense.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

tommorris, 7 months ago

@simon as @luis_in_brief said: probably enforceable but only against the OpenAI user who has agreed to them not against the world. Given the arbitration clause, and the fact that all that is in dispute is access to the API, it would be rather unlikely to ever be litigated.

It’s just a variation of a pretty standard “you can’t use access to our product to build a competitor” clause and the fact that the class of product is ML models doesn’t really change much.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

mesirii, 7 months ago

@simon Hmm your question actually sparked another one for me.

Should we get LLMs to summarize all those ultra-long T&C's and EULA's for us and highlight critical issues?

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

luis_in_brief, 7 months ago

@simon (1) AFAIK no one has tested any of these specific clauses in court. Terms and conditions, more generally, are reasonably well-tested in court, though their reach is fairly limited.

(2) The remedies (i.e., court-ordered penalties/fixed) you can get from violations of contract law (which governs terms like these) are somewhat limited relative to what you can get from a copyright violation. Not surprisingly this is why people prefer copyright law, if they can get it.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

luis_in_brief, 7 months ago

@simon (3) “these guys are hypocrites” is true here, but (as you surmise) not a defense—at the end of the day you entered into a contract with them knowing they were hypocrites! (That’s not a formal legal doctrine but judges are realists.)

(4) I suspect that this is in there mostly so they can stop large-scale commercial use and that they’d be very, very unlikely to even ever bother to consider the GIGO scenario you’re worried about.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

luis_in_brief, 7 months ago

@simon (5) (depending on why you were concerned, this might be the most important point) contract enforcement generally requires “privity”; i.e., that the parties involved have made an agreement with each other.

If Joe Rando agrees to these terms, spams the internet with OpenAI-generated text, and you unknowingly (or likely even knowingly) consume that OpenAI-generated spam to train your own model, OpenAI may have some claim against Joe Rando—but not against you.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

luis_in_brief, 7 months ago

@simon (5ish) This is different from copyright—if OpenAI had a copyright in their text, Joe Rando posts it on the internet, and you copied/ingested/etc. it, Joe Rando and you would both potentially be liable; copyright is a right against the whole world. But contract is, by design, really about what two parties agree to—it is very hard for them to bind a third party.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

luis_in_brief, 7 months ago

@simon (5evenmoreish) This is, tangentially, part of why all the ethical/responsible licenses are problematic from an enforcement perspective: if there is no copyright in the output (and it seems very unlikely that the model creator has any copyright in the output; model user is more complex) then it’s logistically complex to make responsible terms stick across multiple tiers of output users.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

luis_in_brief, 7 months ago

@simon (6) All of the above are very general statements; law in both specific jurisdictions and in specific risk analysis situations may be very different. (If you’re a Google or Microsoft employee, for example, forget you just read any of that, because your relationship to OpenAI is very different from Simon’s, on a number of levels.) But hopefully helpful to give non-experts a general lay of the land.

Ooh, new tag: #IAALBIANYL

Old tag: #lawFedi

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

therealadam, 7 months ago

@simon trying not to get dismal here, but “vibes-only" runs rampant through quasi-legal thought, especially in state legislatures, EU tech regulation, and check notes SCOTUS 🤦🏻‍♂️

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

MudMan, 7 months ago

@simon I don't know how any of it fits into current regulatory models at all. Common law systems may be able to stretch into best guesses from judges, but regulation is needed.

Unfortunately the level of public debate is atrocious at the moment, so my trust on solid regulation at the moment is low.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

simon, 7 months ago

IANAL but there must be lawyers who've thought this through by now

My "I don't know what I'm talking about" legal hunch is that violating the T&Cs won't result in anything worse than having your OpenAI key revoked

Presumably model vendors who add these clauses have solid legal advice about their impact... I wonder if this is a "we think this is likely not enforceable, but there's a small chance it is so let's stick it in there in case we need to try our luck with it in court later" thing

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

simon, 7 months ago

In my past (limited) experience working with commercial lawyers it's incredibly difficult to get a definitive yes or no answer out of them about anything!

Tagging @luis_in_brief as someone who may have much more useful things to say about this!

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

simon, 7 months ago

I posed this question over here and got a few interesting responses from people who've spent some more time with this issue https://twitter.com/simonw/status/1717896619000201427

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

virtuous_sloth, 7 months ago

@simon
Does Xitter now not show replies to people who are not logged in?

I only see you query post, no replies.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

simon, 7 months ago

... except apparently if you're not logged in to Twitter you can't see any of the replies any more, so linking to conversations on Twitter is effectively useless now

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ wordshaper

philgyford, 7 months ago

@simon My descent from using Twitter, to mostly avoiding it, to actively removing links to my account, to never bothering to even click a link to a tweet, completed a few weeks back.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

eliocamp, 7 months ago

@simon Does the nitter link show all the replies? (I can't tell without a twitter account)
https://nitter.net/simonw/status/1717896619000201427

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

bornach, 7 months ago

@eliocamp @simon
Only 2 replies show up for me via the nitter.net link.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

jannem, 7 months ago

@simon
Yep. I never bother to follow a Twitter link any longer.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

deadwisdom, 7 months ago

@simon ... You have to be logged in because their T&C is now changed to protect the written language of the site so they can profit from models that want to train on the data.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

jameswalters, 7 months ago

@simon Yes, this has been a depressing consequence of X. 😩

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

archliberal, 7 months ago

@simon yeah the site is actually completely useless without an account now. If someone links a thread or multi-tweet chain, or a reply, you see nothing but the individual tweet you are linked. As if it were just a screen shot. And timelines of profiles when not signed in are completely scrambled and not chronological. So you’ll open a twitter profile from a Google search and see tweet from 5 years ago, last month, 2 years ago, etc and be unable to sort.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ simon

not2b, 7 months ago

@simon @luis_in_brief IANAL, but the net is filling up with generated images. Seems if someone trains a new model based on an image set from Common Crawl or their own bot, it will sweep up a lot of those images and their descriptions (people often post prompts with the image). Terms and conditions as I understand it are only binding on those who agreed to them. But who knows how this will shake out? This is new territory so even expert opinion might have limited value.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

luis_in_brief, 7 months ago

@not2b @simon The base rules (privity, contract formation, type of remedies) are pretty well understood in this particular corner of the law.

There are many genuinely unknown (even unknowable) problems in AI law, and might be many non-legal reasons to worry about the problem of training on generated images, but these terms are fairly straightforward.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

isaac2, 7 months ago

@simon My similar hunch is that they'd have no way of knowing

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

Add comment