simon,
@simon@simonwillison.net avatar

I'm quoted in this @arstechnica piece about that recent "AI generated" George Carlin special

I don't think it was written by AI

I found the whole thing grossly disrespectful, but I do slightly appreciate the meta-joke here that the AI generated text is fake and was actually written by humans

https://arstechnica.com/ai/2024/01/did-an-ai-write-that-hour-long-george-carlin-special-im-not-convinced/

simon,
@simon@simonwillison.net avatar

“The real story here is… everyone is ready to believe that AI can do things, even if it can't,” Willison told Ars. “In this case, it's pretty clear what's going on if you look at the wider context of the show in question. But anyone without that context, [a viewer] is much more likely to believe that the whole thing was AI-generated… thanks to the massive ramp up in the quality of AI output we have seen in the past 12 months.”

simon,
@simon@simonwillison.net avatar

Confirmed by the New York Times:

> Danielle Del, a spokeswoman for Sasso, said Dudesy is not actually an A.I.
>
> “It’s a fictional podcast character created by two human beings, Will Sasso and Chad Kultgen,” Del wrote in an email. “The YouTube video ‘I’m Glad I’m Dead’ was completely written by Chad Kultgen.”

https://www.nytimes.com/2024/01/26/arts/carlin-lawsuit-ai-podcast-copyright.html

ramsey,
@ramsey@phpc.social avatar

@simon I’m not able to read the article, but it sounds like a copyright claim issue. Why would it be any less of a copyright violation if it wasn’t A.I.? That is, they claim they wrote it and not A.I., so does that change the copyright infringement claim?

tappenden,
ramsey,
@ramsey@phpc.social avatar
gadgetboy,
@gadgetboy@gadgetboy.social avatar

@ramsey @simon I had the same thought. Then again, if it was a parody, what would be the difference between an AI and an impersonator? This is all so murky, right now.

ramsey,
@ramsey@phpc.social avatar

@gadgetboy @simon @tappenden So, it sounds like what is at issue isn’t that the content of the podcast itself violates Carlin’s copyright, but the estate contends they trained an AI using copyrighted materials, and that’s what they are suing over. This is pretty interesting.

simon,
@simon@simonwillison.net avatar

@ramsey @tappenden Yeah, except they didn't train AI over copyrighted materials at all - they just said that they did because it's part of their "Dudesy" comedy bit

simon,
@simon@simonwillison.net avatar

@ramsey I don't see how it's a copyright violation if someone wrote an hour of original material trying to imitate George Carlin's style - where's the copyrighted content they are duplicating?

The lawsuit still has legs though, see point 81: "Defendants have knowingly and intentionally utilized and continue to utilize the name, image and likeness of Carlin without the consent of Plaintiffs"

That's "rights of publicity" which I believe is a separate thing from copyright

https://deadline.com/wp-content/uploads/2024/01/George-Carlin-AI-lawsuit.pdf

ramsey,
@ramsey@phpc.social avatar

@simon > I don't see how it's a copyright violation if someone wrote an hour of original material trying to imitate George Carlin's style - where's the copyrighted content they are duplicating?

This is where I’m interested in understanding how the court will respond to cases like this. In a sense, the author of the material trained their brain on George Carlin’s copyrighted material and produced a work that imitates his style.

How is an LLM any different?

simon,
@simon@simonwillison.net avatar

@ramsey this is effectively the same argument that's core to the NYT lawsuit against OpenAI and Microsoft - the argument is that the LLM model itself is a derived work of the content that was used to train it, and that it falls outside of "fair use" criteria - that's the key question which needs to be decided in court

ramsey,
@ramsey@phpc.social avatar

@simon How is the LLM responding when I ask it to quote from specific books? For example, I just prompted ChatGPT 3.5 to give me the first few paragraphs from The Hobbit, and it gave them to me verbatim.

ramsey, (edited )
@ramsey@phpc.social avatar

@simon It is interesting, though, that while it’s a verbatim recreation of the opening paragraphs, all the British (Commonwealth) spellings have been replaced with American spellings. 🤣

ramsey,
@ramsey@phpc.social avatar

@simon Not sure whether you saw my question here, but I’m still very curious and perplexed by this. If an LLM doesn’t store the full text of materials it was trained on, then how does it produce output like what I’m seeing?

sean,
@sean@scoat.es avatar

@ramsey @simon I don’t know the details, specifically, but isn’t this somewhat like how you know what number comes after 1827391723793472349 without ever having counted to it?

ramsey,
@ramsey@phpc.social avatar

@sean @simon Maybe? So, it can quote entire passages from books, based on that premise?

sean,
@sean@scoat.es avatar

@ramsey @simon I’m not sure, either. Maybe it tokenizes and stores popular excerpts like the first few paragraphs.

I should probably have just stayed out of this; I admittedly don’t know what I’m talking about. (-:

ramsey,
@ramsey@phpc.social avatar

@sean @simon Haha. It’s fun to guess (hypothesize) at what it does. 🤷‍♂️

I’m asking Simon because I know he’s done a lot of research on this. I’m very close to leaning towards LLMs not violating copyright if they don’t store copyrighted material and are only “learning” patterns. In that way, it’s very similar to the human brain. But if an LLM can reproduce the first few pages of copyrighted material, then thats problematic, for me.

derickr,
@derickr@phpc.social avatar

@ramsey @sean @simon Training LLMs on data, for which no permission has been given is problematic to me.

ramsey,
@ramsey@phpc.social avatar

@derickr @sean @simon I’m not saying it’s not problematic to me, but I’m open to thinking about it.

preinheimer,
@preinheimer@phpc.social avatar

@ramsey @sean @simon

I dunno man. I'm pretty far on the other side. Giving model builders free range to train their stuff on things humans have built seems like a large transfer of wealth from the creative class to the technology class.

Also, if my kid's school wants to teach my kids music. They need to pay for that music. Even though it's just for training! Why give these model building billionaires a free ride?

ramsey,
@ramsey@phpc.social avatar

@preinheimer @sean @simon I’m not saying they shouldn’t have to pay the creators.

preinheimer,
@preinheimer@phpc.social avatar

@ramsey Thank you for correcting me!

ramsey,
@ramsey@phpc.social avatar

@preinheimer I can’t tell whether this is sarcasm. How did I correct you?

preinheimer,
@preinheimer@phpc.social avatar

@ramsey It's not sarcasm!

Just your clarification that you weren't suggesting that they shouldn't pay creators.

ramsey,
@ramsey@phpc.social avatar

@preinheimer Stealing from creators to train their models is wrong and evil. My comment about (potentially) not violating copyright was more about how the LLM stores the information.

derflocki,
@derflocki@phpc.social avatar

@ramsey @simon
My mental model of what an llm is that it's a "probability machine": given some input it generates the most probable output.

If you want to go deeper, I have found this article by Stephen Wolfram quite helpful: https://writings.stephenwolfram.com/2023/02/what-is-chatgpt-doing-and-why-does-it-work/

simon,
@simon@simonwillison.net avatar

@ramsey my current mental model is that memorization can happen if it's seen multiple copies of the same text, such that it effectively encodes the probability of word 60 in that text as following words 1 through 59 as being extremely high

ramsey,
@ramsey@phpc.social avatar

@simon I guess the question the courts will have to answer is whether capturing the probability at such a high level is enough to constitute holding a copy of the work, since the work can be reproduced with such a low level of effort, when prompted.

simon,
@simon@simonwillison.net avatar

@ramsey yeah that feels like the right question to me - and honestly I don't think there's an obvious "right" answer to it, no idea how this will shake out in court

simon,
@simon@simonwillison.net avatar

@ramsey but... the NYT lawsuit has lots of examples of it memorizing full articles - were those present multiple times in the training data or did OpenAI mark NYT content as specifically "high quality" in a way that made it more likely to memorize them?

simon, (edited )
@simon@simonwillison.net avatar

The lawsuit still has legs I think, since it's not just about using copyrighted content to train an AI (which they didn't do) - it also complains about "violation of rights of publicity" - see point 81 in this PDF:

> Defendants have knowingly and intentionally utilized and continue to utilize the name, image and likeness of Carlin without the consent of Plaintiffs

https://deadline.com/wp-content/uploads/2024/01/George-Carlin-AI-lawsuit.pdf

jtlg,
@jtlg@mastodon.lawprofs.org avatar

@simon A colleague notes that both the right of publicity claims are likely to fail. The California common-law right is subject to caselaw (from a case involving Bela Lugosi's estate) that it doesn't continue after the celebrity's death. The California statutory right is post-mortem, but excludes "fictional or nonfictional entertainment.” They might be winners in some other state, but not in California.

State-specific details at https://rightofpublicityroadmap.com/state_page/california/

glyph,
@glyph@mastodon.social avatar

@simon one of the areas of IP law I have zero understanding of is “likeness rights” or “rights of publicity”. Like I know if X takes a photograph of Y then Y has no copyright because X is the “author” of the photo. But what rights do they have? In what jurisdictions?

nevali,
@nevali@troet.cafe avatar

@simon i'm pretty sure that who wrote the jokes is very secondary in most people's minds as compared to who appears to be performing them

simon,
@simon@simonwillison.net avatar

@nevali The voice cloning bit is definitely interesting - I'm pretty sure they used ElevenLabs or similar for that, and if I'm right then they would have trained the voice model on copyrighted recordings of George Carlin's voice https://til.simonwillison.net/misc/voice-cloning

nevali,
@nevali@troet.cafe avatar

@simon i, and i imagine most people, couldn't care less if it was three monkeys on acid with an electric typewriter, the issue is principally about reviving the likeness of somebody whose direct relatives are very much still alive

in a way, the jokes not being written by something trained on Carlin's work just makes it an extremely unpleasant fraud

simon,
@simon@simonwillison.net avatar

@nevali Right, that's my opinion on this too - I think whether or not AI was involved in writing it is immaterial to how poor taste it was, and I'd be happy to see the estate win a lawsuit over Californian "violation of rights of publicity"

nevali,
@nevali@troet.cafe avatar

@simon aside from the moral horrors of this one, it would be interesting if an AI-written comedy were actually any good

probably uncomfortable for different reasons, mind you…

that said, a good performer can make bad material funny, and comic timing is probably a very difficult thing to emulate, maybe that's even harder than the writing…

simon,
@simon@simonwillison.net avatar

@nevali I've found that LLMs are terrible at telling traditional jokes, but can be surprisingly funny if you use them for things like parody and satire

Fake Onion articles for example can work really well, because the point of something like the Onion is to take a ludicrous premise and present it in with a straight voice. LLMs are very good at imitating straight news writing

I still don't like publishing text that was created by an LLM but I often amuse myself privately with this kind of thing

simon,
@simon@simonwillison.net avatar

@nevali I also amuse myself by having ChatGPT et al "write a convincing letter to the mayor of Half Moon Bay advocating for the installation of cosy boxes for pelicans at the harbor" - it's one of my test prompts for new LLMs

frew,
@frew@mastodon.social avatar

@simon fwiw sasso did an interview with Lex Friedman about dudesy. P sure it’s like, inside out ai. Like the ai prompts the humans?

simon,
@simon@simonwillison.net avatar

@frew Wow that's embarrassing if the Friedman interview didn't hone in on the fact that it's all basically a comedy hoax

They started the Dudesy thing back in early 2022, before even GPT-3.5-Turbo / ChatGPT had been released - there's no WAY they had anything interesting running on 2022-era GPT-3

simon,
@simon@simonwillison.net avatar

@frew Found that snippet of the Friedman interview here, and yeah he just gave the story that some anonymous company built them an AI without being challenged on it https://www.youtube.com/watch?v=xewD1apJNhw&t=2649

frew,
@frew@mastodon.social avatar

@simon right, I wouldn’t characterize it as a hoax as much as a gimmick? It’s been a long time though so I could be forgetting

simon,
@simon@simonwillison.net avatar

@frew I mean it's a comedy bit - no harm caused at first, but it's started contributing to the problem that people think AI is capable of WAY more than it actually is

I'm not a regular Lex Friedman viewer so maybe I'm wrong in guessing that he would care about whether or not the things his guests tell him are misleading or not!

jezdez,
@jezdez@publicidentity.net avatar

@simon @arstechnica What a mess! Hard to believe they thought they’d get away with it?

SomeGadgetGuy,
@SomeGadgetGuy@techhub.social avatar

@simon @arstechnica Wouldn't be shocked if the AI took a crack at it, and then it was "punched up" by actual humans. Seems to be what studios are hoping for.
Keep a few people around to make the AI output functional. 🙄

simon,
@simon@simonwillison.net avatar

@SomeGadgetGuy @arstechnica I think they might have thrown a few prompts through ChatGPT to help brainstorm ideas along the way, but that's a long way from "the AI wrote it"

SomeGadgetGuy,
@SomeGadgetGuy@techhub.social avatar

@simon @arstechnica Agreed. We have to keep the hype train running, but the practical application of these AI tools doesn't seem to be quite covering their energy costs yet...

mahryekuh,
@mahryekuh@fosstodon.org avatar

@simon @arstechnica Someone didn't proofread the citation of the quote, which says you are research instead of a researcher 🤔

simon,
@simon@simonwillison.net avatar

@mahryekuh @arstechnica hah yeah I spotted that, I've reported it to them

  • All
  • Subscribed
  • Moderated
  • Favorites
  • random
  • ngwrru68w68
  • rosin
  • GTA5RPClips
  • osvaldo12
  • love
  • Youngstown
  • slotface
  • khanakhh
  • everett
  • kavyap
  • mdbf
  • DreamBathrooms
  • thenastyranch
  • magazineikmin
  • megavids
  • InstantRegret
  • normalnudes
  • tacticalgear
  • cubers
  • ethstaker
  • modclub
  • cisconetworking
  • Durango
  • anitta
  • Leos
  • tester
  • provamag3
  • JUstTest
  • All magazines