At yesterday's US House subcommittee hearing on risks of AI, there was a lot of... - Random

emilymbender, 7 months ago

At yesterday's US House subcommittee hearing on risks of AI, there was a lot of enthusiasm for privacy enhancing ML technologies. That seems like a valuable direction, but it alone isn't enough. A company training privacy enhanced ML systems over your data ... still has your data, and we don't have to accept that.

Apropos:

https://www.scientificamerican.com/article/your-personal-information-is-probably-being-used-to-train-generative-ai-models/

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ Jonathanglick, CandaceRobbAuthor, keen456, briankrebs +10 more

Image

Image alternative text

medievalist, 7 months ago

@emilymbender I wonder about how copyright law affects personal data

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

Tatty_Corum, 7 months ago

@emilymbender Erik, she's right.
AI, not making the world a better place!

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

pyperkub, 7 months ago

@emilymbender ON the good news side, California is starting to lead the charge against these companies... SB 362 is now law. https://www.eff.org/deeplinks/2023/08/californias-delete-act-protects-us-data-brokers

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

ErikJonker, 7 months ago

@emilymbender what i don't understand, if i post something explicitly public (like this post on Mastodon), why should we not accept that companies use that data to train AI on ? And if we don't allow that what does "public posting" on the internet then even mean ? 🤔

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

gpollara, 7 months ago

@ErikJonker @emilymbender Agree - public posts are public. If you don't want your info / knowledge scraped, then we need to restrict ourselves to private chat rooms or nothing at all.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ ErikJonker

emilymbender, 7 months ago

@ErikJonker Hard disagree. There's a world of difference between my using language to communicate with the broad public and my being willing to have that same language scooped up for someone else to profit off of.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

ErikJonker, 7 months ago

@emilymbender
..if you communicate broad and publicly, how can you expect your data not to be re-used , the logic escapes me, especially if we talk about public blogs, social media posts etc. For books, movies it's may be different (although also complicated)

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

eagerpebble, 7 months ago

@ErikJonker @emilymbender Social media is not public. Usage of the service and the content is governed by terms of service that you are forced to agree to in order to use it. Blogs are governed by copyright, just as books and movies are. No one has a complete right to use the works as they please. The big issue with the latter being that copyright law is pretty gray on whether using protected material to train AI is allowed.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

ErikJonker, 7 months ago

@eagerpebble @emilymbender ...within it's term of service my Mastodon posts are public, or not ? I think we need additional laws and regulations with regard to AI

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

eagerpebble, 7 months ago

@ErikJonker Ok, we need new laws. That does not prevent us from using existing copyright and contractual law to protect content. The "we need new laws" groups often imply the rest: "so we don't need to do anything right now."

I want to see the FTC and individuals succeed in protecting IP in court to force these AI companies to scale way back.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

ErikJonker, 7 months ago

@eagerpebble i agree ! But I am afraid an awful lot of data/content can't be protected

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

eagerpebble, 7 months ago

@ErikJonker Possibly. I think we need a whole new slate of privacy laws focused not in AI, but on the massive surveillance systems we have in place now. Providing sensitive information to someone has changed drastically in how many other actors are involved:

olden times: 1 person
mail: dozens of people with laws preventing any intermediaries from reading it
early email: anywhere from 1 to 20 large corporations that can read whatever they want going through their systems
online services: a patchwork of laws regulating credit card information, health information, etc. with the rest being managed through coerced contractual agreement (i.e. agree or don't use the service) with 10's to 100's of corporations with various privacy and security policies handling your sensitive information

We clearly need a better framework.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

abucci, 7 months ago

@ErikJonker @emilymbender When you walk into a public place, are you giving anyone else present in that public place permission to do a 3-d scan of your face and body and then use that scan to make a puppet that looks almost exactly like you? Are you, further, giving them permission to use that puppet for whatever purpose they choose, without your consent? What if they use it to make bigoted comments, for instance, or to tell lies? What does "going out in public" even mean, if not this broad permission to use your 3-d likeness however anyone out there sees fit?

The fact that the technology exists to perform some task does not immediately confer permission to use the technology to perform that task. The existence of a capability and the permission to use it are separate issues and we keep them separate in virtually all areas where technology and human life intersect, for very good reason. Why on Earth would it be any different when it comes to posting online?

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

ErikJonker, 7 months ago

@abucci @emilymbender Those are interesting arguments🤔 , but the intent of making public statements or putting text publicly online is that people read and learn from it, ofcourse there can be some limits on re-use but re-using texts in other products (paid or unpaid) are not part of that , at least in my opinion. I see the risks too but simply excluding new technologies like generative AI from re-using public content is not going too fly in my opinion, but we will see

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

abucci, 7 months ago

@ErikJonker That might be your purpose for making public statements, but it is not everyone's purpose. Surely you see there is room for disagreement at least?

I find it bizarre that after all that you still don't think that excluding generative AI is going to fly. Are you unfamiliar with robots.txt? Anyone who runs a web site can place a filed named "robots.txt" at the root and anything that crawls the web--whether it be search engines or anything else--is supposed to respect that file. One thing you can express in the file is "don't index my web site". It is straightforward to place rules in robots.txt that would exclude OpenAI (makers of ChatGPT) from using contents from your web site, provided they respect the standards.

In other words, not only is excluding generative AI going to fly, it's already flying.

Generative AI is not just "a new technology". It's a dangerous technology, and I think it's fully rational for people to respond to it with a skepticism and restraint.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

ErikJonker, 7 months ago

@abucci ... ofcourse things like robot.txt will work but as with search, the large majority wants to be indexed and trained on I think, I am fully aware of the risks of this technology but also of the large potential. Also a lot of content can't be excluded with simple solutions like robot.txt, take YouTube for example, all (!) will be used for training the next Google models, , quite scary...

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

rowat_c, 7 months ago

@ErikJonker @abucci @emilymbender a lens I try when starting to think about AI regulation issues is, "do we need separate standards for silicon and carbon?"

Here: copyright law protects IP, fair use recognises exceptions, especially transformative ones. e.g Warhol's Campbell's soup painting in the Getty Images v Stable Diffusion case.

A difference that seems relevant to me is scale. If that's our concern, though, maybe regulation should focus on it (e.g. "personal use") rather than substrate?

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

benjacobsen, 7 months ago

@ErikJonker @emilymbender The idea of a clean public/private dichotomy is sort of intuitive but falls apart in practice. It's inappropriate to eavesdrop on people even if they're talking in a public space, for instance. There's no domain where truly 'anything goes' - there are always contextual norms governing how information can be transmitted and used.

I'd recommend checking out Helen Nissenbaum, she's a great privacy scholar who's written a lot about this. Keyword: contextual integrity

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

Add comment