I'm quoted in this @arstechnica@mastodon.social piece about that recent "AI... - Random

simon, 4 months ago

I'm quoted in this @arstechnica piece about that recent "AI generated" George Carlin special

I don't think it was written by AI

I found the whole thing grossly disrespectful, but I do slightly appreciate the meta-joke here that the AI generated text is fake and was actually written by humans

https://arstechnica.com/ai/2024/01/did-an-ai-write-that-hour-long-george-carlin-special-im-not-convinced/

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ stefan, ehmatthes, ppatel, TechDesk

Image

Image alternative text

simon, 4 months ago

“The real story here is… everyone is ready to believe that AI can do things, even if it can't,” Willison told Ars. “In this case, it's pretty clear what's going on if you look at the wider context of the show in question. But anyone without that context, [a viewer] is much more likely to believe that the whole thing was AI-generated… thanks to the massive ramp up in the quality of AI output we have seen in the past 12 months.”

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ Binder, ehmatthes

simon, 4 months ago

Confirmed by the New York Times:

> Danielle Del, a spokeswoman for Sasso, said Dudesy is not actually an A.I.
>
> “It’s a fictional podcast character created by two human beings, Will Sasso and Chad Kultgen,” Del wrote in an email. “The YouTube video ‘I’m Glad I’m Dead’ was completely written by Chad Kultgen.”

https://www.nytimes.com/2024/01/26/arts/carlin-lawsuit-ai-podcast-copyright.html

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

ramsey, 4 months ago

@simon I’m not able to read the article, but it sounds like a copyright claim issue. Why would it be any less of a copyright violation if it wasn’t A.I.? That is, they claim they wrote it and not A.I., so does that change the copyright infringement claim?

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

tappenden, 4 months ago

@ramsey @simon Guest link, if you want it.

https://www.nytimes.com/2024/01/26/arts/carlin-lawsuit-ai-podcast-copyright.html?unlocked_article_code=1.Q00.MHba.Z9WVpfTpIAe_&smid=url-share

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

ramsey, 4 months ago

@tappenden @simon Thanks

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

gadgetboy, 4 months ago

@ramsey @simon I had the same thought. Then again, if it was a parody, what would be the difference between an AI and an impersonator? This is all so murky, right now.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

ramsey, 4 months ago

@gadgetboy @simon @tappenden So, it sounds like what is at issue isn’t that the content of the podcast itself violates Carlin’s copyright, but the estate contends they trained an AI using copyrighted materials, and that’s what they are suing over. This is pretty interesting.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

simon, 4 months ago

@ramsey @tappenden Yeah, except they didn't train AI over copyrighted materials at all - they just said that they did because it's part of their "Dudesy" comedy bit

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

simon, 4 months ago

@ramsey I don't see how it's a copyright violation if someone wrote an hour of original material trying to imitate George Carlin's style - where's the copyrighted content they are duplicating?

The lawsuit still has legs though, see point 81: "Defendants have knowingly and intentionally utilized and continue to utilize the name, image and likeness of Carlin without the consent of Plaintiffs"

That's "rights of publicity" which I believe is a separate thing from copyright

https://deadline.com/wp-content/uploads/2024/01/George-Carlin-AI-lawsuit.pdf

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

ramsey, 4 months ago

@simon > I don't see how it's a copyright violation if someone wrote an hour of original material trying to imitate George Carlin's style - where's the copyrighted content they are duplicating?

This is where I’m interested in understanding how the court will respond to cases like this. In a sense, the author of the material trained their brain on George Carlin’s copyrighted material and produced a work that imitates his style.

How is an LLM any different?

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

simon, 4 months ago

@ramsey this is effectively the same argument that's core to the NYT lawsuit against OpenAI and Microsoft - the argument is that the LLM model itself is a derived work of the content that was used to train it, and that it falls outside of "fair use" criteria - that's the key question which needs to be decided in court

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

ramsey, 4 months ago

@simon How is the LLM responding when I ask it to quote from specific books? For example, I just prompted ChatGPT 3.5 to give me the first few paragraphs from The Hobbit, and it gave them to me verbatim.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

ramsey, 4 months ago (edited 4 months ago)

@simon It is interesting, though, that while it’s a verbatim recreation of the opening paragraphs, all the British (Commonwealth) spellings have been replaced with American spellings. 🤣

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

ramsey, 4 months ago

@simon Not sure whether you saw my question here, but I’m still very curious and perplexed by this. If an LLM doesn’t store the full text of materials it was trained on, then how does it produce output like what I’m seeing?

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

sean, 4 months ago

@ramsey @simon I don’t know the details, specifically, but isn’t this somewhat like how you know what number comes after 1827391723793472349 without ever having counted to it?

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

ramsey, 4 months ago

@sean @simon Maybe? So, it can quote entire passages from books, based on that premise?

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

sean, 4 months ago

@ramsey @simon I’m not sure, either. Maybe it tokenizes and stores popular excerpts like the first few paragraphs.

I should probably have just stayed out of this; I admittedly don’t know what I’m talking about. (-:

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

ramsey, 4 months ago

@sean @simon Haha. It’s fun to guess (hypothesize) at what it does. 🤷‍♂️

I’m asking Simon because I know he’s done a lot of research on this. I’m very close to leaning towards LLMs not violating copyright if they don’t store copyrighted material and are only “learning” patterns. In that way, it’s very similar to the human brain. But if an LLM can reproduce the first few pages of copyrighted material, then thats problematic, for me.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

derickr, 4 months ago

@ramsey @sean @simon Training LLMs on data, for which no permission has been given is problematic to me.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

ramsey, 4 months ago

@derickr @sean @simon I’m not saying it’s not problematic to me, but I’m open to thinking about it.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

preinheimer, 4 months ago

@ramsey @sean @simon

I dunno man. I'm pretty far on the other side. Giving model builders free range to train their stuff on things humans have built seems like a large transfer of wealth from the creative class to the technology class.

Also, if my kid's school wants to teach my kids music. They need to pay for that music. Even though it's just for training! Why give these model building billionaires a free ride?

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

ramsey, 4 months ago

@preinheimer @sean @simon I’m not saying they shouldn’t have to pay the creators.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

preinheimer, 4 months ago

@ramsey Thank you for correcting me!

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

ramsey, 4 months ago

@preinheimer I can’t tell whether this is sarcasm. How did I correct you?

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

preinheimer, 4 months ago

@ramsey It's not sarcasm!

Just your clarification that you weren't suggesting that they shouldn't pay creators.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

ramsey, 4 months ago

@preinheimer Stealing from creators to train their models is wrong and evil. My comment about (potentially) not violating copyright was more about how the LLM stores the information.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

derflocki, 4 months ago

@ramsey @simon
My mental model of what an llm is that it's a "probability machine": given some input it generates the most probable output.

If you want to go deeper, I have found this article by Stephen Wolfram quite helpful: https://writings.stephenwolfram.com/2023/02/what-is-chatgpt-doing-and-why-does-it-work/

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

simon, 4 months ago

@ramsey my current mental model is that memorization can happen if it's seen multiple copies of the same text, such that it effectively encodes the probability of word 60 in that text as following words 1 through 59 as being extremely high

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

ramsey, 4 months ago

@simon I guess the question the courts will have to answer is whether capturing the probability at such a high level is enough to constitute holding a copy of the work, since the work can be reproduced with such a low level of effort, when prompted.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

simon, 4 months ago

@ramsey yeah that feels like the right question to me - and honestly I don't think there's an obvious "right" answer to it, no idea how this will shake out in court

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

simon, 4 months ago

@ramsey but... the NYT lawsuit has lots of examples of it memorizing full articles - were those present multiple times in the training data or did OpenAI mark NYT content as specifically "high quality" in a way that made it more likely to memorize them?

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

simon, 4 months ago (edited 4 months ago)

The lawsuit still has legs I think, since it's not just about using copyrighted content to train an AI (which they didn't do) - it also complains about "violation of rights of publicity" - see point 81 in this PDF:

> Defendants have knowingly and intentionally utilized and continue to utilize the name, image and likeness of Carlin without the consent of Plaintiffs

https://deadline.com/wp-content/uploads/2024/01/George-Carlin-AI-lawsuit.pdf

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

jtlg, 4 months ago

@simon A colleague notes that both the right of publicity claims are likely to fail. The California common-law right is subject to caselaw (from a case involving Bela Lugosi's estate) that it doesn't continue after the celebrity's death. The California statutory right is post-mortem, but excludes "fictional or nonfictional entertainment.” They might be winners in some other state, but not in California.

State-specific details at https://rightofpublicityroadmap.com/state_page/california/

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ simon

glyph, 4 months ago

@simon one of the areas of IP law I have zero understanding of is “likeness rights” or “rights of publicity”. Like I know if X takes a photograph of Y then Y has no copyright because X is the “author” of the photo. But what rights do they have? In what jurisdictions?

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

nevali, 4 months ago

@simon i'm pretty sure that who wrote the jokes is very secondary in most people's minds as compared to who appears to be performing them

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

simon, 4 months ago

@nevali The voice cloning bit is definitely interesting - I'm pretty sure they used ElevenLabs or similar for that, and if I'm right then they would have trained the voice model on copyrighted recordings of George Carlin's voice https://til.simonwillison.net/misc/voice-cloning

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

nevali, 4 months ago

@simon i, and i imagine most people, couldn't care less if it was three monkeys on acid with an electric typewriter, the issue is principally about reviving the likeness of somebody whose direct relatives are very much still alive

in a way, the jokes not being written by something trained on Carlin's work just makes it an extremely unpleasant fraud

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

simon, 4 months ago

@nevali Right, that's my opinion on this too - I think whether or not AI was involved in writing it is immaterial to how poor taste it was, and I'd be happy to see the estate win a lawsuit over Californian "violation of rights of publicity"

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

nevali, 4 months ago

@simon aside from the moral horrors of this one, it would be interesting if an AI-written comedy were actually any good

probably uncomfortable for different reasons, mind you…

that said, a good performer can make bad material funny, and comic timing is probably a very difficult thing to emulate, maybe that's even harder than the writing…

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

simon, 4 months ago

@nevali I've found that LLMs are terrible at telling traditional jokes, but can be surprisingly funny if you use them for things like parody and satire

Fake Onion articles for example can work really well, because the point of something like the Onion is to take a ludicrous premise and present it in with a straight voice. LLMs are very good at imitating straight news writing

I still don't like publishing text that was created by an LLM but I often amuse myself privately with this kind of thing

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

simon, 4 months ago

@nevali I also amuse myself by having ChatGPT et al "write a convincing letter to the mayor of Half Moon Bay advocating for the installation of cosy boxes for pelicans at the harbor" - it's one of my test prompts for new LLMs

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

frew, 4 months ago

@simon fwiw sasso did an interview with Lex Friedman about dudesy. P sure it’s like, inside out ai. Like the ai prompts the humans?

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

simon, 4 months ago

@frew Wow that's embarrassing if the Friedman interview didn't hone in on the fact that it's all basically a comedy hoax

They started the Dudesy thing back in early 2022, before even GPT-3.5-Turbo / ChatGPT had been released - there's no WAY they had anything interesting running on 2022-era GPT-3

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

simon, 4 months ago

@frew Found that snippet of the Friedman interview here, and yeah he just gave the story that some anonymous company built them an AI without being challenged on it https://www.youtube.com/watch?v=xewD1apJNhw&t=2649

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

frew, 4 months ago

@simon right, I wouldn’t characterize it as a hoax as much as a gimmick? It’s been a long time though so I could be forgetting

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

simon, 4 months ago

@frew I mean it's a comedy bit - no harm caused at first, but it's started contributing to the problem that people think AI is capable of WAY more than it actually is

I'm not a regular Lex Friedman viewer so maybe I'm wrong in guessing that he would care about whether or not the things his guests tell him are misleading or not!

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

jezdez, 4 months ago

@simon @arstechnica What a mess! Hard to believe they thought they’d get away with it?

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

SomeGadgetGuy, 4 months ago

@simon @arstechnica Wouldn't be shocked if the AI took a crack at it, and then it was "punched up" by actual humans. Seems to be what studios are hoping for.
Keep a few people around to make the AI output functional. 🙄

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

simon, 4 months ago

@SomeGadgetGuy @arstechnica I think they might have thrown a few prompts through ChatGPT to help brainstorm ideas along the way, but that's a long way from "the AI wrote it"

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

SomeGadgetGuy, 4 months ago

@simon @arstechnica Agreed. We have to keep the hype train running, but the practical application of these AI tools doesn't seem to be quite covering their energy costs yet...

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

mahryekuh, 4 months ago

@simon @arstechnica Someone didn't proofread the citation of the quote, which says you are research instead of a researcher 🤔

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

simon, 4 months ago

@mahryekuh @arstechnica hah yeah I spotted that, I've reported it to them

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

Add comment