I am once again encouraging #Fediverse #admins to add a clause to prohibit using... - Fediverse

drahardja, 11 months ago

I am once again encouraging #Fediverse #admins to add a clause to prohibit using their server’s data for machine training into their Terms of Use, because at some point in the near future there is likely going to be a lawsuit against some major company for scraping and exploiting users’ data, and we should make sure we have a legal leg to stand on.

Please ask your server’s admin to do this.

@seb please consider doing this. https://mastodon.world/@Chimaera/110652906429977656

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ MaggieCi, JoergSorge, thomas, whynothugo +46 more

Image

Image alternative text

victor, 11 months ago

deleted_by_author

Loading...

drahardja, 11 months ago

@victor I’m willing to chip in money to have a lawyer write something up that admins can copy/paste easily. Let me know what you think.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

LALegault, 11 months ago

deleted_by_author

Loading...

victor, 11 months ago

deleted_by_author

Loading...

LALegault, 11 months ago

deleted_by_author

Loading...

EverydayMoggie, 11 months ago

That's almost certainly going to require a lawyer to write it, if you want to word it in such a way that it has any hope of being enforceable in a suit at some future time.

@drahardja @seb

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

drahardja, 11 months ago

@EverydayMoggie @seb I’m willing to chip in money to pay a lawyer to write it.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

DataDrivenMD, 11 months ago

@drahardja @seb Meant to respond to your earlier post (a couple weeks ago IIRC) to strongly endorse this suggestion and to encourage developers, especially those of us who offer open APIs, to do the same. (We already have thanks to your suggestion)

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ drahardja

drahardja, 11 months ago

@DataDrivenMD @seb Awesome!

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

tehstu, 11 months ago

@drahardja @seb Feels like this might need a collective effort, perhaps per jurisdiction. Something an admin can just roll out, like a code license, without being an effort in the legalese. Great suggestion to have everyone start thinking about this.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

franktaber, 11 months ago

@drahardja @seb Also may be useful to add a noncommercial use policy. Here is some discussion.

https://mas.to/@franktaber/110602489997086618

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

danmcd, 11 months ago

@drahardja @seb

@alarig 👆 (Unless you've done it already...)

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

pbeasleyhall, 11 months ago

@drahardja @seb 👆 @alxsim

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

SecureInStyle, 11 months ago

@jquillin Perhaps something to consider?

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

feld, 11 months ago

deleted_by_author

Loading...

drahardja, 11 months ago

@feld Are you suggesting that consuming data served by a privately-owned server is the same thing as taking a picture from a public space? Because it’s not.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

feld, 11 months ago

deleted_by_author

Loading...

villares, 11 months ago

@feld @drahardja in Europe some publicly viewable places are not fair game, crazy stuff "Freedom of Panorama": https://en.m.wikipedia.org/wiki/Freedom_of_panorama

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

squeevening, 11 months ago

@drahardja @seb ooh, excellent idea.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

victor, 11 months ago

deleted_by_author

Loading...

squeevening, 11 months ago

@victor I'd be glad to. I spent four miserable days writing my own TOS, and then I made them Creative Commons, or whatever that one is that says, "Take what you need, Fam." 🥰🥰🥰

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

victor, 11 months ago

deleted_by_author

Loading...

squeevening, 11 months ago

@victor *but not til July 6th for me. Gonna attempt this "relax" thing ppl without ADHD can do tomorrow, and then the 5th is the misters birthday and we are gonna watch a whole movie. 😂😂😂🥂

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

victor, 11 months ago

deleted_by_author

Loading...

squeevening, 11 months ago

@victor I better write this down. 😂😂😂💪

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

skastodon, 11 months ago

@drahardja This is a good idea. Is there a boilerplate piece of text that can be used?

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

drahardja, 11 months ago

@skastodon No, but I think that would be a great idea.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

skastodon, 11 months ago

@drahardja Here's what I came up with for now.

https://skastodon.com/@skastodon/110653611122688169

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

interstellarenigma, 11 months ago

@drahardja I think one of the biggest challenges is how you can provide evidence that they have used such user data for machine learning training.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

drahardja, 11 months ago

@interstellarenigma By the time a lawsuit emerges, I hope it would be clear how we can prove it. My guess is there will likely be some range of IP addresses that have been used exclusively for grabbing data for training, and we can show a pattern of access from that address.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

linkeddev, 11 months ago

@drahardja @seb I really would like to do this but I honestly am not sure how to add that in a way that would hold up in a court. I'm not a lawyer :blobNervous:

Is there an example, or better yet, a pre-written clause that admins can use that has been vetted?

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

drahardja, 11 months ago

@seb I’m willing to chip in to pay lawyers to write something up, or at least consult as to the viability of such a clause. If you know of a lawyer who’s familiar with this space of IP law (especially in your part of the world) please recommend them here.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

mybarkingdogs, 11 months ago

@drahardja @seb I agree, and yet at the same time, unlike FB federation (which is exclusively harmful and should never be allowed) - I think that we need to address machine-learning scraping by creating marked, flagged, opt-in consent areas (which no one should be able to unintentionally stumble into) to be intentionally scraped, on purpose, ideally with fair compensation for participation.

while doing what you say and keeping scrapers out of most places with such a clause on anywhere that could be exploiting people/sensitive/etc.

(I say this because machine learning is already biased hard toward cishet white normie sources at best, and outright hate groups at worst. Leftist, or at the very least non-hateful, representative, and diverse content needs to be introduced into datasets at wide scale.)

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ Brendanjones

Brendanjones, 11 months ago

@mybarkingdogs @drahardja @seb I was thinking about this in recent days when I saw another post talking about stopping scraping of fediverse data. Of course many reasons to not allow it, but based on how much more pleasant the discourse is on here, I’d sure rather ML algos were trained on data from here rather than most other comment-based parts of the internet.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

mybarkingdogs, 11 months ago

@Brendanjones @drahardja @seb

What converted me there was when I saw that a horrific stalker site which I won't name but that's responsible for multiple deaths, swattings, more - is part of OpenAI's set. As well as a couple of other awful things.

And as tempting as the response to everything Big Tech does is to just say NO and draw a deep line in the sand (and sometimes, like with Facebook, that's valid - when there's no way to get any advantage for ourselves, and every way to be exploited and subsumed)

other times, it's actually necessary to fight and resist in other ways - whether crapflooding data that could be used to do harm such as personally identifying info to protect people or, in this sense, intentionally providing data to pull the Overton window of datasets away from literal harassment collectives.

If only that someone, somewhere, against all good advice, will ask something running off these datasets about the people the bigots hate.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

drahardja, 11 months ago

@Brendanjones I don’t buy this argument. The main beneficiaries of “AI” today are corporations that have huge investments in compute, who will surely hoard all profit that may arise from products trained on our data. Why should we make their products more enriching or pleasant to use? Don’t fall for the “my tech is inevitable” trope that technologists always use; it is not inevitable that AI will permeate into our lives in any more meaningful measure than spam or “smart” TVs do today: that is, as technological means to scam and surveil us to even greater degrees.

I, for one, am not interested in contributing to this technology in its present form.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ Neverfadingwood

Brendanjones, 11 months ago

@drahardja don’t worry, I’m not actually suggesting to offer up our data. You don’t need to convince me. It was merely a think out loud hypothetical of “how different would ML algos be if they were trained on the (more polite and left-leaning) fediverse content?”

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

drahardja, 11 months ago

@Brendanjones I’ve also thought “what would it look like if generative AI were developed as a public good instead of for corporate profit?”

And then I realize that the end result would still be spam and surveillance, and I shake myself awake.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

StompyRobot, 11 months ago

@drahardja @seb
I don't think such a clause will technically work.

Not only are all the readers machines (so what does "learning" or "training" mean, exactly?), but using machine leaning to classify toots as spam or not is very likely to become a necessity in the future. Saying "no machine learning" would cut that off at the knees.

The best way to prevent others from using my writing, is to keep it private -- email, or closed forums. Public writing is, by necessity, public.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

supernovae, 11 months ago

deleted_by_author

Loading...

raphaelmorgan, 11 months ago

@supernovae @StompyRobot @drahardja @seb they could probably carefully word exceptions, like that it's not allowed unless for moderation purposes and only from posts that have been reported (just an example, not the words I think should actually be put there lol I have no idea)

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

allo, 11 months ago

deleted_by_author

Loading...

dozymoe, 11 months ago

@drahardja @seb what if it was home baked for moderation purposes.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

drahardja, 11 months ago

@dozymoe @seb I’m not convinced that ML is that useful for moderation or spam patrol (see the mess that is Facebook automoderation), but that is a good point. What’s important, I think is that the admin and users explicitly consent to any training use. A per-user setting (just like opt-in searchability) might be the right balance to strike.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

adamshostack, 11 months ago

@drahardja @brian @seb Cc @jerry

(No idea how that would work. Are our words here under some specific copyright/license as part of the TOS?)

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

dascritch, 11 months ago

@drahardja @seb There is a standard for that : TDM reservation protocol , as today in its final days for community review @w3c

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ drahardja

cooopsspace, 11 months ago

@drahardja @seb on this, what I'm most worried about is post expiry time. If my post is set to delete after a month that should be mandated in the protocol. It's not meant to be copied and kept indefinitely.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

dalias, 11 months ago

@drahardja @seb Unless the instance's ToS require you to license them to do stuff with your posts including sublicense to scrapers, the default is that any such use is infringement. Making it explicitly against ToS could be nice, but should not be necessary.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

yoasif, 11 months ago

@drahardja @seb @jerry can you do this for Fedia Kbin?

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

Frances_Larina, 11 months ago

@drahardja @seb

How about an even more general clause about not collecting users data for profit or politics, period?

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ raphaelmorgan

irisvirus, 11 months ago

@BenjaminHimes for your consideration

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

ottocrat, 11 months ago

@drahardja I would guess that @PaulNemitz is all over it

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

Add comment