drahardja,
@drahardja@sfba.social avatar

I am once again encouraging to add a clause to prohibit using their server’s data for machine training into their Terms of Use, because at some point in the near future there is likely going to be a lawsuit against some major company for scraping and exploiting users’ data, and we should make sure we have a legal leg to stand on.

Please ask your server’s admin to do this.

@seb please consider doing this. https://mastodon.world/@Chimaera/110652906429977656

victor,

deleted_by_author

  • Loading...
  • drahardja,
    @drahardja@sfba.social avatar

    @victor I’m willing to chip in money to have a lawyer write something up that admins can copy/paste easily. Let me know what you think.

    LALegault,
    @LALegault@newsie.social avatar

    deleted_by_author

  • Loading...
  • victor,

    deleted_by_author

  • Loading...
  • LALegault,
    @LALegault@newsie.social avatar

    deleted_by_author

  • Loading...
  • EverydayMoggie,
    @EverydayMoggie@sfba.social avatar

    That's almost certainly going to require a lawyer to write it, if you want to word it in such a way that it has any hope of being enforceable in a suit at some future time.

    @drahardja @seb

    drahardja,
    @drahardja@sfba.social avatar

    @EverydayMoggie @seb I’m willing to chip in money to pay a lawyer to write it.

    DataDrivenMD,

    @drahardja @seb Meant to respond to your earlier post (a couple weeks ago IIRC) to strongly endorse this suggestion and to encourage developers, especially those of us who offer open APIs, to do the same. (We already have thanks to your suggestion)

    drahardja,
    @drahardja@sfba.social avatar

    @DataDrivenMD @seb Awesome!

    tehstu,
    @tehstu@hachyderm.io avatar

    @drahardja @seb Feels like this might need a collective effort, perhaps per jurisdiction. Something an admin can just roll out, like a code license, without being an effort in the legalese. Great suggestion to have everyone start thinking about this.

    franktaber,
    @franktaber@mas.to avatar

    @drahardja @seb Also may be useful to add a noncommercial use policy. Here is some discussion.

    https://mas.to/@franktaber/110602489997086618

    danmcd,
    @danmcd@hostux.social avatar

    @drahardja @seb

    @alarig 👆 (Unless you've done it already...)

    pbeasleyhall,
    SecureInStyle,

    @jquillin Perhaps something to consider?

    feld,
    @feld@bikeshed.party avatar

    deleted_by_author

  • Loading...
  • drahardja,
    @drahardja@sfba.social avatar

    @feld Are you suggesting that consuming data served by a privately-owned server is the same thing as taking a picture from a public space? Because it’s not.

    feld,
    @feld@bikeshed.party avatar

    deleted_by_author

  • Loading...
  • villares,
    @villares@ciberlandia.pt avatar

    @feld @drahardja in Europe some publicly viewable places are not fair game, crazy stuff "Freedom of Panorama": https://en.m.wikipedia.org/wiki/Freedom_of_panorama

    squeevening,

    @drahardja @seb ooh, excellent idea.

    victor,

    deleted_by_author

  • Loading...
  • squeevening,

    @victor I'd be glad to. I spent four miserable days writing my own TOS, and then I made them Creative Commons, or whatever that one is that says, "Take what you need, Fam." 🥰🥰🥰

    victor,

    deleted_by_author

  • Loading...
  • squeevening,

    @victor *but not til July 6th for me. Gonna attempt this "relax" thing ppl without ADHD can do tomorrow, and then the 5th is the misters birthday and we are gonna watch a whole movie. 😂😂😂🥂

    victor,

    deleted_by_author

  • Loading...
  • squeevening,

    @victor I better write this down. 😂😂😂💪

    skastodon,

    @drahardja This is a good idea. Is there a boilerplate piece of text that can be used?

    drahardja,
    @drahardja@sfba.social avatar

    @skastodon No, but I think that would be a great idea.

    skastodon,
    interstellarenigma,

    @drahardja I think one of the biggest challenges is how you can provide evidence that they have used such user data for machine learning training.

    drahardja,
    @drahardja@sfba.social avatar

    @interstellarenigma By the time a lawsuit emerges, I hope it would be clear how we can prove it. My guess is there will likely be some range of IP addresses that have been used exclusively for grabbing data for training, and we can show a pattern of access from that address.

    linkeddev,

    @drahardja @seb I really would like to do this but I honestly am not sure how to add that in a way that would hold up in a court. I'm not a lawyer :blobNervous:

    Is there an example, or better yet, a pre-written clause that admins can use that has been vetted?

    drahardja,
    @drahardja@sfba.social avatar

    @seb I’m willing to chip in to pay lawyers to write something up, or at least consult as to the viability of such a clause. If you know of a lawyer who’s familiar with this space of IP law (especially in your part of the world) please recommend them here.

    mybarkingdogs,

    @drahardja @seb I agree, and yet at the same time, unlike FB federation (which is exclusively harmful and should never be allowed) - I think that we need to address machine-learning scraping by creating marked, flagged, opt-in consent areas (which no one should be able to unintentionally stumble into) to be intentionally scraped, on purpose, ideally with fair compensation for participation.

    while doing what you say and keeping scrapers out of most places with such a clause on anywhere that could be exploiting people/sensitive/etc.

    (I say this because machine learning is already biased hard toward cishet white normie sources at best, and outright hate groups at worst. Leftist, or at the very least non-hateful, representative, and diverse content needs to be introduced into datasets at wide scale.)

    Brendanjones,
    @Brendanjones@fosstodon.org avatar

    @mybarkingdogs @drahardja @seb I was thinking about this in recent days when I saw another post talking about stopping scraping of fediverse data. Of course many reasons to not allow it, but based on how much more pleasant the discourse is on here, I’d sure rather ML algos were trained on data from here rather than most other comment-based parts of the internet.

    mybarkingdogs,

    @Brendanjones @drahardja @seb

    What converted me there was when I saw that a horrific stalker site which I won't name but that's responsible for multiple deaths, swattings, more - is part of OpenAI's set. As well as a couple of other awful things.

    And as tempting as the response to everything Big Tech does is to just say NO and draw a deep line in the sand (and sometimes, like with Facebook, that's valid - when there's no way to get any advantage for ourselves, and every way to be exploited and subsumed)

    • other times, it's actually necessary to fight and resist in other ways - whether crapflooding data that could be used to do harm such as personally identifying info to protect people or, in this sense, intentionally providing data to pull the Overton window of datasets away from literal harassment collectives.

    If only that someone, somewhere, against all good advice, will ask something running off these datasets about the people the bigots hate.

    drahardja,
    @drahardja@sfba.social avatar

    @Brendanjones I don’t buy this argument. The main beneficiaries of “AI” today are corporations that have huge investments in compute, who will surely hoard all profit that may arise from products trained on our data. Why should we make their products more enriching or pleasant to use? Don’t fall for the “my tech is inevitable” trope that technologists always use; it is not inevitable that AI will permeate into our lives in any more meaningful measure than spam or “smart” TVs do today: that is, as technological means to scam and surveil us to even greater degrees.

    I, for one, am not interested in contributing to this technology in its present form.

    Brendanjones,
    @Brendanjones@fosstodon.org avatar

    @drahardja don’t worry, I’m not actually suggesting to offer up our data. You don’t need to convince me. It was merely a think out loud hypothetical of “how different would ML algos be if they were trained on the (more polite and left-leaning) fediverse content?”

    drahardja,
    @drahardja@sfba.social avatar

    @Brendanjones I’ve also thought “what would it look like if generative AI were developed as a public good instead of for corporate profit?”

    And then I realize that the end result would still be spam and surveillance, and I shake myself awake.

    StompyRobot,
    @StompyRobot@mastodon.gamedev.place avatar

    @drahardja @seb
    I don't think such a clause will technically work.

    Not only are all the readers machines (so what does "learning" or "training" mean, exactly?), but using machine leaning to classify toots as spam or not is very likely to become a necessity in the future. Saying "no machine learning" would cut that off at the knees.

    The best way to prevent others from using my writing, is to keep it private -- email, or closed forums. Public writing is, by necessity, public.

    supernovae,

    deleted_by_author

  • Loading...
  • raphaelmorgan,
    @raphaelmorgan@disabled.social avatar

    @supernovae @StompyRobot @drahardja @seb they could probably carefully word exceptions, like that it's not allowed unless for moderation purposes and only from posts that have been reported (just an example, not the words I think should actually be put there lol I have no idea)

    allo,
    @allo@chaos.social avatar

    deleted_by_author

  • Loading...
  • allo,
    @allo@chaos.social avatar

    deleted_by_author

  • Loading...
  • dozymoe,
    @dozymoe@mastodon.social avatar

    @drahardja @seb what if it was home baked for moderation purposes.

    drahardja,
    @drahardja@sfba.social avatar

    @dozymoe @seb I’m not convinced that ML is that useful for moderation or spam patrol (see the mess that is Facebook automoderation), but that is a good point. What’s important, I think is that the admin and users explicitly consent to any training use. A per-user setting (just like opt-in searchability) might be the right balance to strike.

    adamshostack,

    @drahardja @brian @seb Cc @jerry

    (No idea how that would work. Are our words here under some specific copyright/license as part of the TOS?)

    dascritch,

    @drahardja @seb There is a standard for that : TDM reservation protocol , as today in its final days for community review @w3c

    cooopsspace,

    @drahardja @seb on this, what I'm most worried about is post expiry time. If my post is set to delete after a month that should be mandated in the protocol. It's not meant to be copied and kept indefinitely.

    dalias,
    @dalias@hachyderm.io avatar

    @drahardja @seb Unless the instance's ToS require you to license them to do stuff with your posts including sublicense to scrapers, the default is that any such use is infringement. Making it explicitly against ToS could be nice, but should not be necessary.

    yoasif,
    @yoasif@mastodon.social avatar

    @drahardja @seb @jerry can you do this for Fedia Kbin?

    Frances_Larina,

    @drahardja @seb

    How about an even more general clause about not collecting users data for profit or politics, period?

    irisvirus,

    @BenjaminHimes for your consideration

    ottocrat,

    @drahardja I would guess that @PaulNemitz is all over it

  • All
  • Subscribed
  • Moderated
  • Favorites
  • fediverse
  • DreamBathrooms
  • InstantRegret
  • ethstaker
  • magazineikmin
  • GTA5RPClips
  • rosin
  • modclub
  • Youngstown
  • ngwrru68w68
  • slotface
  • osvaldo12
  • kavyap
  • mdbf
  • thenastyranch
  • megavids
  • everett
  • cubers
  • cisconetworking
  • tester
  • Durango
  • Leos
  • khanakhh
  • tacticalgear
  • normalnudes
  • anitta
  • provamag3
  • JUstTest
  • lostlight
  • All magazines