drahardja,
@drahardja@sfba.social avatar

I am once again encouraging to add a clause to prohibit using their server’s data for machine training into their Terms of Use, because at some point in the near future there is likely going to be a lawsuit against some major company for scraping and exploiting users’ data, and we should make sure we have a legal leg to stand on.

Please ask your server’s admin to do this.

@seb please consider doing this. https://mastodon.world/@Chimaera/110652906429977656

neuralgraffiti,
@neuralgraffiti@sfba.social avatar

@drahardja @seb Hi Dave! This is something the admin team is already working on, along with some other policy updates.
I'll push this to the top of the queue.

We have a lawyer on the team (that's me), so no need to collect money for the updates; I do SFBA work on a pro bono basis.

drahardja,
@drahardja@sfba.social avatar

@neuralgraffiti @seb I’m super happy to hear that!

integerpoet,
@integerpoet@sfba.social avatar

@neuralgraffiti @drahardja Decades ago, I — obviously — decided not to pursue a law degree, but since then every time I hear about pro bono work I question that decision. 😀

neuralgraffiti,
@neuralgraffiti@sfba.social avatar

@integerpoet @drahardja It’s a mixed bag. I the work I do, but I miss being a developer sometimes.

integerpoet,
@integerpoet@sfba.social avatar

@neuralgraffiti @drahardja I don’t wish I were a lawyer in general, but it would be cool if there were the equivalent of pro bono work for a software engineer with my specialties (and NDAs).

drahardja,
@drahardja@sfba.social avatar

@integerpoet @neuralgraffiti It’s called a pseudonym (aka alt account)

pmonks,
@pmonks@sfba.social avatar

@neuralgraffiti @drahardja @seb Possibly dumb/RTFM question, but how is content that’s been posted to sfba.social licensed downstream? And if there isn’t a default, might I suggest CC-BY-NC-SA-4.0, with the ability for individual users to choose their own license if they wish?

neuralgraffiti,
@neuralgraffiti@sfba.social avatar

@pmonks @drahardja @seb Not a dumb question at all. We don't apply a particular license to the content right now, which means it's copyright to the poster (the legal default).

ActivityPub doesn't currently support applying varying licenses to content, although @timbray has proposed a protocol enhancement to support it: https://codeberg.org/fediverse/fep/src/branch/main/fep/c118/fep-c118.md

pmonks,
@pmonks@sfba.social avatar

@neuralgraffiti @drahardja @seb @timbray While a protocol enhancement for this is definitely valuable, is it legally necessary? Wouldn’t a notice on the server’s “about” page (or similar) legally serve the same purpose (albeit without any machine discoverability)?

neuralgraffiti,
@neuralgraffiti@sfba.social avatar

@pmonks @drahardja @seb @timbray Nope, definitely not required. I was more addressing the second part of your question re user’s choosing licenses.

pmonks,
@pmonks@sfba.social avatar

@neuralgraffiti @drahardja @seb @timbray Right, but isn’t a server default necessary anyway, for the (vast majority) of users who don’t bother explicitly licensing their own content?

And isn’t a protocol enhancement also nice but unnecessary for the per-user case? For example I explicitly license my content via a simple textual statement in my profile (though again, not easily machine discoverable, unlike a protocol enhancement).

neuralgraffiti,
@neuralgraffiti@sfba.social avatar

@pmonks @drahardja @seb @timbray Yes to the per-user case. You could put a license in your profile or something and license your posts however you want.

As for the need for a default, there already is a server-wide default as far as copyright licensing goes. Any post eligible for copyright in the first place is automatically protected as soon as it posted.* So applying, e.g., a Creative Commons license by default would essentially take some rights away from our users by making their posts available for non-commercial use by others or whatever other permissions the license grants. (1/2)

neuralgraffiti,
@neuralgraffiti@sfba.social avatar

@pmonks @drahardja @seb @timbray Circling back to the initial reason for this thread, where things get a more complicated is the interaction between contract law and copyright. In broad strokes, you can apply contract law to protect things that aren’t necessarily covered by copyright. The arguable reason to put something in our Terms of Service is that some pro-ML folks argue that using data for training a model is a fair use. If they are right, then it’s not a copyright violation to use the data, regardless if what license you’ve granted to the public. But it could be a violation of a provision in a ToS, giving you a breach of contract claim against someone improperly using the data.

This is putting aside some other complications, like implied licenses and such, but this is already a long post for my phone…

  • The legal standard is that it is protected when it is “fixed in a tangible medium.”
    (2/2)
osma,
@osma@mas.to avatar

deleted_by_author

supernovae,

deleted_by_author

  • Loading...
  • Frances_Larina,

    @drahardja @seb

    How about an even more general clause about not collecting users data for profit or politics, period?

    dascritch,
    @dascritch@mast.eu.org avatar

    @drahardja @seb There is a standard for that : TDM reservation protocol , as today in its final days for community review @w3c

    mybarkingdogs,

    @drahardja @seb I agree, and yet at the same time, unlike FB federation (which is exclusively harmful and should never be allowed) - I think that we need to address machine-learning scraping by creating marked, flagged, opt-in consent areas (which no one should be able to unintentionally stumble into) to be intentionally scraped, on purpose, ideally with fair compensation for participation.

    while doing what you say and keeping scrapers out of most places with such a clause on anywhere that could be exploiting people/sensitive/etc.

    (I say this because machine learning is already biased hard toward cishet white normie sources at best, and outright hate groups at worst. Leftist, or at the very least non-hateful, representative, and diverse content needs to be introduced into datasets at wide scale.)

    Brendanjones,
    @Brendanjones@fosstodon.org avatar

    @mybarkingdogs @drahardja @seb I was thinking about this in recent days when I saw another post talking about stopping scraping of fediverse data. Of course many reasons to not allow it, but based on how much more pleasant the discourse is on here, I’d sure rather ML algos were trained on data from here rather than most other comment-based parts of the internet.

    mybarkingdogs,

    @Brendanjones @drahardja @seb

    What converted me there was when I saw that a horrific stalker site which I won't name but that's responsible for multiple deaths, swattings, more - is part of OpenAI's set. As well as a couple of other awful things.

    And as tempting as the response to everything Big Tech does is to just say NO and draw a deep line in the sand (and sometimes, like with Facebook, that's valid - when there's no way to get any advantage for ourselves, and every way to be exploited and subsumed)

    • other times, it's actually necessary to fight and resist in other ways - whether crapflooding data that could be used to do harm such as personally identifying info to protect people or, in this sense, intentionally providing data to pull the Overton window of datasets away from literal harassment collectives.

    If only that someone, somewhere, against all good advice, will ask something running off these datasets about the people the bigots hate.

    drahardja,
    @drahardja@sfba.social avatar

    @Brendanjones I don’t buy this argument. The main beneficiaries of “AI” today are corporations that have huge investments in compute, who will surely hoard all profit that may arise from products trained on our data. Why should we make their products more enriching or pleasant to use? Don’t fall for the “my tech is inevitable” trope that technologists always use; it is not inevitable that AI will permeate into our lives in any more meaningful measure than spam or “smart” TVs do today: that is, as technological means to scam and surveil us to even greater degrees.

    I, for one, am not interested in contributing to this technology in its present form.

    Brendanjones,
    @Brendanjones@fosstodon.org avatar

    @drahardja don’t worry, I’m not actually suggesting to offer up our data. You don’t need to convince me. It was merely a think out loud hypothetical of “how different would ML algos be if they were trained on the (more polite and left-leaning) fediverse content?”

    drahardja,
    @drahardja@sfba.social avatar

    @Brendanjones I’ve also thought “what would it look like if generative AI were developed as a public good instead of for corporate profit?”

    And then I realize that the end result would still be spam and surveillance, and I shake myself awake.

    DataDrivenMD,
    @DataDrivenMD@fedified.com avatar

    @drahardja @seb Meant to respond to your earlier post (a couple weeks ago IIRC) to strongly endorse this suggestion and to encourage developers, especially those of us who offer open APIs, to do the same. (We already have thanks to your suggestion)

    drahardja,
    @drahardja@sfba.social avatar

    @DataDrivenMD @seb Awesome!

    EricErack,
    @EricErack@masto.bike avatar

    @drahardja
    Coucou @alter_unicorn et @cyrille qu'en pensez-vous ?

    harkank,
    @harkank@chaos.social avatar

    @drahardja @seb
    Here is the lawsuit running against M$ & OpenAI. 3 billion Dollars. More tomorrow inm German.

    https://storage.courtlistener.com/recap/gov.uscourts.cand.414754/gov.uscourts.cand.414754.1.0.pdf

    Gemma,
    @Gemma@zeroes.ca avatar

    @drahardja @trendless Another idea ☺️

    polychromata,

    @drahardja Do you have drop-in text for this? I tried to just make something up and realized that it would also disallow federation, which is obviously not what I want.

    drahardja,
    @drahardja@sfba.social avatar

    @polychromata I don’t. There are lawyers working on this so I’m hopeful we’ll have something.

    roryh,

    @leo sounds like something you might get behind.

    ZeirosLion,

    @drahardja
    @seb

    Just gonna slide this to @self.

    SpeakerToManagers,
    @SpeakerToManagers@wandering.shop avatar
    danciruli,
    @danciruli@hachyderm.io avatar

    @drahardja @seb I'd love to hear some IP lawyers weigh in. Certainly it is technically possible to read everything in a feed without ever agreeing to any terms of service (just browse to https://sfba.social/@drahardja, or heck, subscribe to https://sfba.social/@drahardja.rss to get it programmatically).

    So there's no technical way to get a company to agree before reading. Not sure if you could include copyright text in every post somehow...

    billy,
    pilhofer,
    @pilhofer@journa.host avatar

    @seb @drahardja Sorry, isn’t this basically settled case law? Any lawyers handy to address this? I’m super curious.

    https://www.eff.org/deeplinks/2022/04/scraping-public-websites-still-isnt-crime-court-appeals-declares

    HistoPol,
    @HistoPol@mastodon.social avatar
    elmerot,
    @elmerot@mastodon.nu avatar

    @drahardja @seb
    Good idea! Ping
    @admin

    meowryveilles,

    @mods could be relevant?

    poemproducer,
    @poemproducer@systerserver.town avatar
    amart,
    @amart@hachyderm.io avatar

    @drahardja @seb It’s clear, large corporations don’t care about what’s legally permissible in whatever jurisdictions they’re operating. I don’t understand the activity, pub API, but it would seem more effective to cripple the ability to mass scrape by throttling access in an appropriate fashion that wouldn’t impact typical usage patterns of users. I trust big corporations to not scrape data if they can’t scrape data.

    simon_lucy,
    @simon_lucy@mastodon.social avatar

    @drahardja @seb

    If you can't enforce it it's not worth the effort.

    What would be worth the effort is a treaty between as many governments as possible that enforced disclosing the sources of data for the purposes of machine learning and allowed individual and group rights for the exclusion of data whether before or after being used.

    Along with that, a requirement for all models to provide an EXPLAIN function that delivered both the sources used and the inferences made.

    yannickdoteu,

    @drahardja @seb

    @JohanEmpa, is this something you consider?

    bok_bok_ba_gok,

    @drahardja @seb @trumpet please consider doing this! 😘

    essjay,

    @drahardja

    @lily
    What do you think?

    @seb

  • All
  • Subscribed
  • Moderated
  • Favorites
  • fediverse
  • DreamBathrooms
  • magazineikmin
  • ethstaker
  • khanakhh
  • rosin
  • Youngstown
  • everett
  • slotface
  • ngwrru68w68
  • mdbf
  • GTA5RPClips
  • kavyap
  • thenastyranch
  • cisconetworking
  • JUstTest
  • cubers
  • Leos
  • InstantRegret
  • Durango
  • tacticalgear
  • tester
  • osvaldo12
  • normalnudes
  • anitta
  • modclub
  • megavids
  • provamag3
  • lostlight
  • All magazines