I am once again encouraging #Fediverse #admins to add a clause to prohibit using... - Fediverse

drahardja, 10 months ago

I am once again encouraging #Fediverse #admins to add a clause to prohibit using their server’s data for machine training into their Terms of Use, because at some point in the near future there is likely going to be a lawsuit against some major company for scraping and exploiting users’ data, and we should make sure we have a legal leg to stand on.

Please ask your server’s admin to do this.

@seb please consider doing this. https://mastodon.world/@Chimaera/110652906429977656

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ MaggieCi, JoergSorge, thomas, whynothugo +46 more

Image

Image alternative text

neuralgraffiti, 10 months ago

@drahardja @seb Hi Dave! This is something the admin team is already working on, along with some other policy updates.
I'll push this to the top of the queue.

We have a lawyer on the team (that's me), so no need to collect money for the updates; I do SFBA work on a pro bono basis.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ cd24, aymericmarlange, drahardja

drahardja, 10 months ago

@neuralgraffiti @seb I’m super happy to hear that!

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

integerpoet, 10 months ago

@neuralgraffiti @drahardja Decades ago, I — obviously — decided not to pursue a law degree, but since then every time I hear about pro bono work I question that decision. 😀

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

neuralgraffiti, 10 months ago

@integerpoet @drahardja It’s a mixed bag. I the work I do, but I miss being a developer sometimes.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

integerpoet, 10 months ago

@neuralgraffiti @drahardja I don’t wish I were a lawyer in general, but it would be cool if there were the equivalent of pro bono work for a software engineer with my specialties (and NDAs).

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

drahardja, 10 months ago

@integerpoet @neuralgraffiti It’s called a pseudonym (aka alt account)

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

pmonks, 10 months ago

@neuralgraffiti @drahardja @seb Possibly dumb/RTFM question, but how is content that’s been posted to sfba.social licensed downstream? And if there isn’t a default, might I suggest CC-BY-NC-SA-4.0, with the ability for individual users to choose their own license if they wish?

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

neuralgraffiti, 10 months ago

@pmonks @drahardja @seb Not a dumb question at all. We don't apply a particular license to the content right now, which means it's copyright to the poster (the legal default).

ActivityPub doesn't currently support applying varying licenses to content, although @timbray has proposed a protocol enhancement to support it: https://codeberg.org/fediverse/fep/src/branch/main/fep/c118/fep-c118.md

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

pmonks, 10 months ago

@neuralgraffiti @drahardja @seb @timbray While a protocol enhancement for this is definitely valuable, is it legally necessary? Wouldn’t a notice on the server’s “about” page (or similar) legally serve the same purpose (albeit without any machine discoverability)?

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

neuralgraffiti, 10 months ago

@pmonks @drahardja @seb @timbray Nope, definitely not required. I was more addressing the second part of your question re user’s choosing licenses.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

pmonks, 10 months ago

@neuralgraffiti @drahardja @seb @timbray Right, but isn’t a server default necessary anyway, for the (vast majority) of users who don’t bother explicitly licensing their own content?

And isn’t a protocol enhancement also nice but unnecessary for the per-user case? For example I explicitly license my content via a simple textual statement in my profile (though again, not easily machine discoverable, unlike a protocol enhancement).

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

neuralgraffiti, 10 months ago

@pmonks @drahardja @seb @timbray Yes to the per-user case. You could put a license in your profile or something and license your posts however you want.

As for the need for a default, there already is a server-wide default as far as copyright licensing goes. Any post eligible for copyright in the first place is automatically protected as soon as it posted.* So applying, e.g., a Creative Commons license by default would essentially take some rights away from our users by making their posts available for non-commercial use by others or whatever other permissions the license grants. (1/2)

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

neuralgraffiti, 10 months ago

@pmonks @drahardja @seb @timbray Circling back to the initial reason for this thread, where things get a more complicated is the interaction between contract law and copyright. In broad strokes, you can apply contract law to protect things that aren’t necessarily covered by copyright. The arguable reason to put something in our Terms of Service is that some pro-ML folks argue that using data for training a model is a fair use. If they are right, then it’s not a copyright violation to use the data, regardless if what license you’ve granted to the public. But it could be a violation of a provision in a ToS, giving you a breach of contract claim against someone improperly using the data.

This is putting aside some other complications, like implied licenses and such, but this is already a long post for my phone…

The legal standard is that it is protected when it is “fixed in a tangible medium.”
(2/2)

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ drahardja

osma, 10 months ago

deleted_by_author

Loading...

+ osma

supernovae, 10 months ago

deleted_by_author

Loading...

Frances_Larina, 10 months ago

@drahardja @seb

How about an even more general clause about not collecting users data for profit or politics, period?

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ raphaelmorgan

dascritch, 10 months ago

@drahardja @seb There is a standard for that : TDM reservation protocol , as today in its final days for community review @w3c

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ drahardja

mybarkingdogs, 10 months ago

@drahardja @seb I agree, and yet at the same time, unlike FB federation (which is exclusively harmful and should never be allowed) - I think that we need to address machine-learning scraping by creating marked, flagged, opt-in consent areas (which no one should be able to unintentionally stumble into) to be intentionally scraped, on purpose, ideally with fair compensation for participation.

while doing what you say and keeping scrapers out of most places with such a clause on anywhere that could be exploiting people/sensitive/etc.

(I say this because machine learning is already biased hard toward cishet white normie sources at best, and outright hate groups at worst. Leftist, or at the very least non-hateful, representative, and diverse content needs to be introduced into datasets at wide scale.)

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ Brendanjones

Brendanjones, 10 months ago

@mybarkingdogs @drahardja @seb I was thinking about this in recent days when I saw another post talking about stopping scraping of fediverse data. Of course many reasons to not allow it, but based on how much more pleasant the discourse is on here, I’d sure rather ML algos were trained on data from here rather than most other comment-based parts of the internet.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

mybarkingdogs, 10 months ago

@Brendanjones @drahardja @seb

What converted me there was when I saw that a horrific stalker site which I won't name but that's responsible for multiple deaths, swattings, more - is part of OpenAI's set. As well as a couple of other awful things.

And as tempting as the response to everything Big Tech does is to just say NO and draw a deep line in the sand (and sometimes, like with Facebook, that's valid - when there's no way to get any advantage for ourselves, and every way to be exploited and subsumed)

other times, it's actually necessary to fight and resist in other ways - whether crapflooding data that could be used to do harm such as personally identifying info to protect people or, in this sense, intentionally providing data to pull the Overton window of datasets away from literal harassment collectives.

If only that someone, somewhere, against all good advice, will ask something running off these datasets about the people the bigots hate.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

drahardja, 10 months ago

@Brendanjones I don’t buy this argument. The main beneficiaries of “AI” today are corporations that have huge investments in compute, who will surely hoard all profit that may arise from products trained on our data. Why should we make their products more enriching or pleasant to use? Don’t fall for the “my tech is inevitable” trope that technologists always use; it is not inevitable that AI will permeate into our lives in any more meaningful measure than spam or “smart” TVs do today: that is, as technological means to scam and surveil us to even greater degrees.

I, for one, am not interested in contributing to this technology in its present form.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ Neverfadingwood

Brendanjones, 10 months ago

@drahardja don’t worry, I’m not actually suggesting to offer up our data. You don’t need to convince me. It was merely a think out loud hypothetical of “how different would ML algos be if they were trained on the (more polite and left-leaning) fediverse content?”

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

drahardja, 10 months ago

@Brendanjones I’ve also thought “what would it look like if generative AI were developed as a public good instead of for corporate profit?”

And then I realize that the end result would still be spam and surveillance, and I shake myself awake.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

DataDrivenMD, 10 months ago

@drahardja @seb Meant to respond to your earlier post (a couple weeks ago IIRC) to strongly endorse this suggestion and to encourage developers, especially those of us who offer open APIs, to do the same. (We already have thanks to your suggestion)

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ drahardja

drahardja, 10 months ago

@DataDrivenMD @seb Awesome!

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

EricErack, 10 months ago

@drahardja
Coucou @alter_unicorn et @cyrille qu'en pensez-vous ?

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

harkank, 10 months ago

@drahardja @seb
Here is the lawsuit running against M$ & OpenAI. 3 billion Dollars. More tomorrow inm German.

https://storage.courtlistener.com/recap/gov.uscourts.cand.414754/gov.uscourts.cand.414754.1.0.pdf

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

Gemma, 10 months ago

@drahardja @trendless Another idea ☺️

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

polychromata, 10 months ago

@drahardja Do you have drop-in text for this? I tried to just make something up and realized that it would also disallow federation, which is obviously not what I want.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

drahardja, 10 months ago

@polychromata I don’t. There are lawyers working on this so I’m hopeful we’ll have something.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

roryh, 10 months ago

@leo sounds like something you might get behind.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

ZeirosLion, 10 months ago

@drahardja
@seb

Just gonna slide this to @self.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

SpeakerToManagers, 10 months ago

@drahardja @seb @elysegrasso
Lawsuit in progress:

https://www.legaldive.com/news/OpenAI-class-action-lawsuit-internet-scraping-privacy-violations-chatgpt-law/654332/

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

danciruli, 10 months ago

@drahardja @seb I'd love to hear some IP lawyers weigh in. Certainly it is technically possible to read everything in a feed without ever agreeing to any terms of service (just browse to https://sfba.social/@drahardja, or heck, subscribe to https://sfba.social/@drahardja.rss to get it programmatically).

So there's no technical way to get a company to agree before reading. Not sure if you could include copyright text in every post somehow...

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

billy, 10 months ago

@drahardja @chad

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

pilhofer, 10 months ago

@seb @drahardja Sorry, isn’t this basically settled case law? Any lawyers handy to address this? I’m super curious.

https://www.eff.org/deeplinks/2022/04/scraping-public-websites-still-isnt-crime-court-appeals-declares

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

HistoPol, 10 months ago

@Gargron

https://sfba.social/@drahardja/110653158490952467

@drahardja
@kcarruthers
@seb

#fediverse #admins

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

elmerot, 10 months ago

@drahardja @seb
Good idea! Ping
@admin

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

meowryveilles, 10 months ago

@mods could be relevant?

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

poemproducer, 10 months ago

@seb @drahardja

🐚

@gaba

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

amart, 10 months ago

@drahardja @seb It’s clear, large corporations don’t care about what’s legally permissible in whatever jurisdictions they’re operating. I don’t understand the activity, pub API, but it would seem more effective to cripple the ability to mass scrape by throttling access in an appropriate fashion that wouldn’t impact typical usage patterns of users. I trust big corporations to not scrape data if they can’t scrape data.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

simon_lucy, 10 months ago

@drahardja @seb

If you can't enforce it it's not worth the effort.

What would be worth the effort is a treaty between as many governments as possible that enforced disclosing the sources of data for the purposes of machine learning and allowed individual and group rights for the exclusion of data whether before or after being used.

Along with that, a requirement for all models to provide an EXPLAIN function that delivered both the sources used and the inferences made.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

yannickdoteu, 10 months ago

@drahardja @seb

@JohanEmpa, is this something you consider?

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

bok_bok_ba_gok, 10 months ago

@drahardja @seb @trumpet please consider doing this! 😘

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

essjay, 10 months ago

@drahardja

@lily
What do you think?

@seb

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

Add comment