simon,
@simon@simonwillison.net avatar

Anyone know if any of these T&Cs about not training new models on the output been explained publicly by expert lawyers, or tested in court yet?

Feels like a huge elephant in the room, especially given the vast number of models fine-tuned on generated data

Yet another part of the AI space that seems to be running on "vibes" - I'm not sure "OpenAI train on crawled data so they can't complain if we train on their stuff" is a robust legal argument, but maybe it is?

alexhudson,

@simon I feel like those clauses directly cut across "you can use our output for your own IP" scenarios - eg Copilot, Dall E, etc. I don't know how you can claim there's nothing subsisting from training in the output, and no IP, and yet claim to control the output in that way?

I also feel like current law is just completely maladjusted to this. I think there's going to be some Berne like agreement to cover AI input and output, and it may not look/work like current law at all. Going back to the social compact of IP, I suspect lobbying about supposed benefits will be intense.

tommorris,
@tommorris@mastodon.social avatar

@simon as @luis_in_brief said: probably enforceable but only against the OpenAI user who has agreed to them not against the world. Given the arbitration clause, and the fact that all that is in dispute is access to the API, it would be rather unlikely to ever be litigated.

It’s just a variation of a pretty standard “you can’t use access to our product to build a competitor” clause and the fact that the class of product is ML models doesn’t really change much.

mesirii,
@mesirii@chaos.social avatar

@simon Hmm your question actually sparked another one for me.

Should we get LLMs to summarize all those ultra-long T&C's and EULA's for us and highlight critical issues?

luis_in_brief,
@luis_in_brief@social.coop avatar

@simon (1) AFAIK no one has tested any of these specific clauses in court. Terms and conditions, more generally, are reasonably well-tested in court, though their reach is fairly limited.

(2) The remedies (i.e., court-ordered penalties/fixed) you can get from violations of contract law (which governs terms like these) are somewhat limited relative to what you can get from a copyright violation. Not surprisingly this is why people prefer copyright law, if they can get it.

luis_in_brief,
@luis_in_brief@social.coop avatar

@simon (3) “these guys are hypocrites” is true here, but (as you surmise) not a defense—at the end of the day you entered into a contract with them knowing they were hypocrites! (That’s not a formal legal doctrine but judges are realists.)

(4) I suspect that this is in there mostly so they can stop large-scale commercial use and that they’d be very, very unlikely to even ever bother to consider the GIGO scenario you’re worried about.

luis_in_brief,
@luis_in_brief@social.coop avatar

@simon (5) (depending on why you were concerned, this might be the most important point) contract enforcement generally requires “privity”; i.e., that the parties involved have made an agreement with each other.

If Joe Rando agrees to these terms, spams the internet with OpenAI-generated text, and you unknowingly (or likely even knowingly) consume that OpenAI-generated spam to train your own model, OpenAI may have some claim against Joe Rando—but not against you.

luis_in_brief,
@luis_in_brief@social.coop avatar

@simon (5ish) This is different from copyright—if OpenAI had a copyright in their text, Joe Rando posts it on the internet, and you copied/ingested/etc. it, Joe Rando and you would both potentially be liable; copyright is a right against the whole world. But contract is, by design, really about what two parties agree to—it is very hard for them to bind a third party.

luis_in_brief,
@luis_in_brief@social.coop avatar

@simon (5evenmoreish) This is, tangentially, part of why all the ethical/responsible licenses are problematic from an enforcement perspective: if there is no copyright in the output (and it seems very unlikely that the model creator has any copyright in the output; model user is more complex) then it’s logistically complex to make responsible terms stick across multiple tiers of output users.

luis_in_brief,
@luis_in_brief@social.coop avatar

@simon (6) All of the above are very general statements; law in both specific jurisdictions and in specific risk analysis situations may be very different. (If you’re a Google or Microsoft employee, for example, forget you just read any of that, because your relationship to OpenAI is very different from Simon’s, on a number of levels.) But hopefully helpful to give non-experts a general lay of the land.

Ooh, new tag:

Old tag:

therealadam,
@therealadam@ruby.social avatar

@simon trying not to get dismal here, but “vibes-only" runs rampant through quasi-legal thought, especially in state legislatures, EU tech regulation, and check notes SCOTUS 🤦🏻‍♂️

MudMan,

@simon I don't know how any of it fits into current regulatory models at all. Common law systems may be able to stretch into best guesses from judges, but regulation is needed.

Unfortunately the level of public debate is atrocious at the moment, so my trust on solid regulation at the moment is low.

simon,
@simon@simonwillison.net avatar

IANAL but there must be lawyers who've thought this through by now

My "I don't know what I'm talking about" legal hunch is that violating the T&Cs won't result in anything worse than having your OpenAI key revoked

Presumably model vendors who add these clauses have solid legal advice about their impact... I wonder if this is a "we think this is likely not enforceable, but there's a small chance it is so let's stick it in there in case we need to try our luck with it in court later" thing

simon,
@simon@simonwillison.net avatar

In my past (limited) experience working with commercial lawyers it's incredibly difficult to get a definitive yes or no answer out of them about anything!

Tagging @luis_in_brief as someone who may have much more useful things to say about this!

simon,
@simon@simonwillison.net avatar

I posed this question over here and got a few interesting responses from people who've spent some more time with this issue https://twitter.com/simonw/status/1717896619000201427

virtuous_sloth,
@virtuous_sloth@cosocial.ca avatar

@simon
Does Xitter now not show replies to people who are not logged in?

I only see you query post, no replies.

simon,
@simon@simonwillison.net avatar

... except apparently if you're not logged in to Twitter you can't see any of the replies any more, so linking to conversations on Twitter is effectively useless now

philgyford,
@philgyford@mastodon.social avatar

@simon My descent from using Twitter, to mostly avoiding it, to actively removing links to my account, to never bothering to even click a link to a tweet, completed a few weeks back.

eliocamp,
@eliocamp@mastodon.social avatar

@simon Does the nitter link show all the replies? (I can't tell without a twitter account)
https://nitter.net/simonw/status/1717896619000201427

bornach,
@bornach@fosstodon.org avatar

@eliocamp @simon
Only 2 replies show up for me via the nitter.net link.

jannem,
@jannem@fosstodon.org avatar

@simon
Yep. I never bother to follow a Twitter link any longer.

deadwisdom,

@simon ... You have to be logged in because their T&C is now changed to protect the written language of the site so they can profit from models that want to train on the data.

jameswalters,
@jameswalters@fosstodon.org avatar

@simon Yes, this has been a depressing consequence of X. 😩

archliberal,
@archliberal@mastodo.neoliber.al avatar

@simon yeah the site is actually completely useless without an account now. If someone links a thread or multi-tweet chain, or a reply, you see nothing but the individual tweet you are linked. As if it were just a screen shot. And timelines of profiles when not signed in are completely scrambled and not chronological. So you’ll open a twitter profile from a Google search and see tweet from 5 years ago, last month, 2 years ago, etc and be unable to sort.

not2b,
@not2b@sfba.social avatar

@simon @luis_in_brief IANAL, but the net is filling up with generated images. Seems if someone trains a new model based on an image set from Common Crawl or their own bot, it will sweep up a lot of those images and their descriptions (people often post prompts with the image). Terms and conditions as I understand it are only binding on those who agreed to them. But who knows how this will shake out? This is new territory so even expert opinion might have limited value.

luis_in_brief,
@luis_in_brief@social.coop avatar

@not2b @simon The base rules (privity, contract formation, type of remedies) are pretty well understood in this particular corner of the law.

There are many genuinely unknown (even unknowable) problems in AI law, and might be many non-legal reasons to worry about the problem of training on generated images, but these terms are fairly straightforward.

isaac2,

@simon My similar hunch is that they'd have no way of knowing

  • All
  • Subscribed
  • Moderated
  • Favorites
  • random
  • ethstaker
  • thenastyranch
  • GTA5RPClips
  • everett
  • Durango
  • rosin
  • InstantRegret
  • DreamBathrooms
  • magazineikmin
  • Youngstown
  • mdbf
  • slotface
  • cisconetworking
  • kavyap
  • JUstTest
  • normalnudes
  • modclub
  • cubers
  • ngwrru68w68
  • khanakhh
  • tacticalgear
  • tester
  • provamag3
  • Leos
  • osvaldo12
  • anitta
  • megavids
  • lostlight
  • All magazines