mcc,
@mcc@mastodon.social avatar

Hard to imagine a signal that a website is a rugpull more intense than banning users for trying to delete their own posts

https://www.tomshardware.com/tech-industry/artificial-intelligence/stack-overflow-bans-users-en-masse-for-rebelling-against-openai-partnership-users-banned-for-deleting-answers-to-prevent-them-being-used-to-train-chatgpt

Like just incredible "burning the future to power the present" energy here

jonathankoren,
@jonathankoren@sfba.social avatar

@mcc I find it weird that people are waking up to the fact that they were doing free labor for a for profit company. Did they think SO was a charity or something?

The real lesson is to never license. It’s just a shakedown by middle men trying to free money. Scrape everything.
Death to copyright.

chris,
@chris@strafpla.net avatar

@mcc So developers will stop sharing information on #StackOverflow and future #Copilot and friends will be forever stuck in the past, answering questions about historically relevant frameworks and languages.
#LLM #StuckOverflow

mcc,
@mcc@mastodon.social avatar

@chris Yeah. But for this to be true, we need a Stack Overflow replacement. And when Reddit went evil, the move to Lemmy doesn't seem to have succeeded as well as the move from Twitter to Mastodon.

a2_4am,
@a2_4am@mastodon.social avatar

@mcc an article went around recently about "rewilding" the Internet that made the analogy to clear cutting an old growth forest. You get incredible wood, but you can only do it once.

chris,
@chris@strafpla.net avatar

@mcc IIRC Mastodon is older than Lemmy and the current move to Mastodon/Fedi happened in multiple waves, so it may be too early for higher expectations.
For stackoverflow I expect some degradation of quality since they accept “AI” generated content. This may additionally frustrate high quality authors and motivate them to leave. We’ll see.
What would a federated stack overflow look like if we were to invent it?

mcc,
@mcc@mastodon.social avatar

@chris I don't know. It's an interesting question because Stack Overflow is inherently more search-focused than Lemmy or Mastodon.

A good model for a distributed/ownerless SO might wind up looking more like bluesky than mastodon.

mcc,
@mcc@mastodon.social avatar

@chris And, of course, there's the weird element that the SO license already does not permit AI on a facial reading, and a distributed SO would probably be easier to scrape than the centralized one. So you're not actually preventing AI exploitation, you're only punishing one corporation (SO) for the AI bait-and-switch.

BillySmith,
@BillySmith@social.coop avatar

@mcc @chris

Or the move from SlashDot to SoylentNews.

Simplest way: If you see a service that has hints of this, warn your friends, and, get a large bucket of popcorn.

mcc,
@mcc@mastodon.social avatar

@chris … which is enough for ME to do a bunch of work and change my usage patterns, but may not be for other people.

mcc,
@mcc@mastodon.social avatar

@BillySmith @chris Don't look at me. I was part of the exodus from SlashDot to Kuro5hin. Which I thought actually went pretty well actually

heretohinder,
@heretohinder@mastodon.sdf.org avatar

@mcc @marmarta looks like you were entirely right (that was quick), apparently not only is ChatGPT content allowed on Stack Overflow, but users are getting blocks for trying to remove their content.

chris,
@chris@strafpla.net avatar

@mcc @BillySmith Good times. Though the husk of slashdot is still around but Kuro5hin is not :-/

mcc,
@mcc@mastodon.social avatar

@chris @BillySmith Yes, which is real unfortunate because some of my best writing is now offline !!! :(

chris,
@chris@strafpla.net avatar

@mcc I personally see less problem in scraping a federated pool of knowledge but I absolutely hate that stackoverflow now owns this knowledge and can keep people from using it but sell “AI” as a service to them.

mcc,
@mcc@mastodon.social avatar

@chris I suppose one thing to consider is if a federated pool of knowledge is CC-BY-SA, then we only need a court ruling that OpenAI violates CC-BY-SA and the federated pool becomes AI-safe. Whereas SO can, (or already has) change the TOS so they own rights to relicense all content.

…but of course, CC-BY-SA is also incredibly inconvenient for a SO clone because everyone will generally want to copypaste sample code!

inkican,
@inkican@mastodon.social avatar

@mcc "We have met the enemy, and he is us."

tuban_muzuru,
@tuban_muzuru@ohai.social avatar

@mcc

I have dealt with novice coders for 40 years. I've told them to steer clear of SO - ask me if you've got a question about anything.

#StuckOverflow isn't entirely bad, but there's enough rat shit stirred into that pudding I would never trust anything trained on its data.

mcc,
@mcc@mastodon.social avatar

@tuban_muzuru I like Stack Overflow, but the problem with it is that it's so old that many of the questions are from like 2008-2015 and that means that often it gives you an answer that was correct ten years ago but is wrong now. (Sometimes they exacerbate this by closing a new question because it's a duplicate of a 2010 one full of outdated answers!)

So… the new "don't forget, you're here forever" policies will probably exacerbate this problem, if fewer high-quality answers come in after 2024.

trochee,
@trochee@dair-community.social avatar

@mcc @tuban_muzuru

Fukuyama's got nothing on this End of History

kboyd,
@kboyd@phpc.social avatar

@mcc @tuban_muzuru I was never much of a participant, something about the participatory experience soured me right off the bat.

Glad I stayed away.

chris,
@chris@strafpla.net avatar

@mcc So we’d be looking for Schrödingers license, allowing and forbidding closed derivative works at the same time :-)

(I have a feeling that a lot of licenses only work because nobody has a close look at how their objects are used.)

RaePatterson,
@RaePatterson@mastodon.social avatar

@mcc

It's funny how a company's attitude towards LLMs depends on whether it thinks it can make money with them or lose money with them.

Very puzzling . :thaenkin:

gpshewan,
@gpshewan@mastodon.social avatar

@mcc Web 3.0, 4.0 or whatever they think this is…is going to push us back to a Web 1.5

And I’m not mad tbh 🤷‍♂️

mcc,
@mcc@mastodon.social avatar

@chris If I were actually trying to create a stackoverflow clone, I'd have the default license be something like "all code blocks are CC0 but all human text outside the code blocks is CC-BY-SA". That would I think match the unspoken expectations both contributors and readers have.

mcc,
@mcc@mastodon.social avatar

@chris I am worried about the effect "AI" scraping is gonna have on copyleft in general, tho. I think people have for many years released copyleft on the rule of "hey, why not" and now the answer is "bc AI". (More thoughts: https://mastodon.social/@mcc/112209121196262534 ) Like, my proposed license in the last post would be very AI-friendly.

woltiv,
@woltiv@mastodon.social avatar

@mcc I think the banned people are in the right, but how did OpenAI not already scrape stack overflow? I've seen samples of ChatGPT's slop that told me the exact same wrong answers that were available on stack overflow (and its sister sites).

I wanted to test it and they reverted one of my protest edits almost immediately by Progman

gsuberland,
@gsuberland@chaos.social avatar

@mcc there's some nuance to the story that isn't necessarily captured in this article. I talked about it more here:

https://chaos.social/@gsuberland/112401284014892261

TL;DR it's important to distinguish between StackExchange Inc as a company and community moderators who are being put in a super awkward spot here, and the "I was banned for trying to delete my posts!" thing isn't quite as plain as simple as has been presented in a lot of places. users & mods are being left with little to no recourse due to SEI's actions.

javiervg,
@javiervg@mastodon.social avatar

@mcc it is incredible that these network effect companies (slack overflow, redit, Twitter, etc) have not figured it out that their value is due to the people that input their ideas into them. Preventing people from deleting their data in this platforms is unethical.

Have been hoping that the government gets smart enough soon and gives people the right to their data no matter where it is stored.

gsuberland,
@gsuberland@chaos.social avatar

@mcc part of me wonders whether SEI already knew this would put users and community moderators at odds with each other, in the hopes that it would prevent organised collaborative protest action that had been effective against SEI in the past (e.g. on the "no LLM answers" issue).

caitp,
@caitp@mstdn.social avatar

@chris @mcc don't worry, they'll probably just stick bots in every matrix/gitter/slack/discord/zulip they can find and train models on that instead

penryu,
@penryu@hachyderm.io avatar

@mcc I can't wait till big tech gets together and convinces the power companies to restructure billing so they can pass the extra cost onto the people.

ClutchAbuse,
@ClutchAbuse@mastodon.gamedev.place avatar

@mcc it's just a cash grab plain and simple. The talented contributors will stop and the site will get overloaded with out of date AI crap. But the owners won't care, they got the big payday they really wanted.

nightclaw,

@mcc every couple years I bother Rusty to post the K5 archive. Nothing yet!

mcc,
@mcc@mastodon.social avatar

@ClutchAbuse I suspect it won't even be that big a payday

mcc,
@mcc@mastodon.social avatar

@nightclaw it would be great even if just individual users could get a data takeout of our posted content tbh

mcc,
@mcc@mastodon.social avatar

Earlier today I edited my (small) set of Stack Overflow posts to add the sentence "I do not consent to my words being used to train OpenAI" to the end. Within hours, all these edits were reversed and I got a warning email for "removing or defacing content". I did not remove any content. If this small sentence is "defacing", it is a very minor defacement. In no way was the experience of other users made worse by me adding one sentence.

To Stack Overflow, you are not a person. You are "content".

mcc,
@mcc@mastodon.social avatar

Not only does Stack Overflow say you don't have a right to remove your words from Stack Overflow, according to Stack Overflow, you don't even have the right to decide what words Stack Overflow publishes under your name.

demofox,
@demofox@mastodon.gamedev.place avatar

@mcc that's so gross.

mcc, (edited )
@mcc@mastodon.social avatar

In the meantime, I have been suspended for 17 hours to "cool down". OpenAI is so, so offended by me saying I don't want them to train on my content. Clearly I am very angry and need to sit in time out.

Noticed this last detail only when I tried to edit my profile and discovered you can't edit your profile while "suspended".

ChateauErin,
@ChateauErin@mastodon.social avatar

@mcc oof. their boilerplate message to you isn't even relevant. This is going to be one of those things like "Reinstate Monica" isn't it

glennf,
@glennf@twit.social avatar

@mcc This is a very interesting strategy they are pursuing to ruin their business.

mcc,
@mcc@mastodon.social avatar

@glennf I don't think they view themselves as a "business" anymore. I think they now view themselves as an asset that can be sold to AI generation companies.

WomanCorn,
@WomanCorn@schelling.pt avatar

@mcc

huh. I thought the LLMs were already trained on StackOverflow.

It's available under some kind of public license, I think. There are a bunch of clone page out there, anyway.

glennf,
@glennf@twit.social avatar

@mcc AI sort of ate their business, so I guess they are returning the favor by feeding themselves to it. (I don't think LLMs provide an accurate alternative to Stack Overflow, but I think Code Pilot and other stuff shunted a lot of traffic.)

mcc,
@mcc@mastodon.social avatar

@WomanCorn If the point of Stack Overflow is to be a block of programming-related text to sell to LLM companies, then it would actually be rational to ban LLM text, as it would poison the LLM inputs.

ocdtrekkie,
@ocdtrekkie@mastodon.social avatar

@mcc I can't wait until StackOverflow learns about GDPR.

ackack,
@ackack@mastodon.gamedev.place avatar

@mcc I did the same thing (edit my posts) and had the same response so I deleted whatever I could and deleted my account.

mcc,
@mcc@mastodon.social avatar

@ackack What did they allow you to delete, and where is the delete account button? Is it under edit profile? I've heard they're reinstating deleted content.

dr_a,
@dr_a@mastodon.social avatar

@mcc @tuban_muzuru I just got upvoted on an answer from 2010 which is kind of shocking, and downvoted on an answer from 2013 because someone apparently didn't like the placement of a period. That place has turned into a dumpster fire.

ackack,
@ackack@mastodon.gamedev.place avatar

@mcc I deleted one that didn't have any answers - just comments under my question. I didn't get a notice about that one being reverted. At least not yet. The other ones which I edited I received notice that they were reinstated.

The account deletion was under the edit profile... somewhere. I'm afraid I can't think of the right section, but it was there a couple of days ago. Took 24 hours for them to confirm the account was deleted.

mcc,
@mcc@mastodon.social avatar

@ackack I think there's something in the rules about questions with an accepted answer

prestontumber,

@mcc Every day my belief in the value of the GPL and libre software rises, it seems. I've never thought of myself as a zealot to take the stances the FSF does but damn the alternative world just looks...well.

ISibboI,
@ISibboI@mastodon.online avatar

@mcc Can I use against them in Europe?

mcc,
@mcc@mastodon.social avatar

@ISibboI I don't know.

mcc,
@mcc@mastodon.social avatar

@prestontumber The content on Stack Overflow is CC-BY-SA, which is basically the GPL but for the written word. Somehow this is happening anyway.

mcc,
@mcc@mastodon.social avatar

@ackack Also thanks

oblomov,
@oblomov@sociale.network avatar

@mcc @prestontumber it happens because nobody has yet brought them to court to have case law in whether LLMs violate CC-BY-SA or not

Ertain,
@Ertain@mast.linuxgamecast.com avatar

@mcc That's some bullshit.

GhostOnTheHalfShell, (edited )
@GhostOnTheHalfShell@masto.ai avatar

@mcc

Gee it would be too bad if users started posting chat gpt garbage on the site and upvoting it,

mcc,
@mcc@mastodon.social avatar

@Ertain IMO

stephen,
@stephen@lyingvoid.social avatar

@mcc Perhaps time for some Malicious Compliance. You could edit your answers so they're not wrong, but just really, really bad code. xN+1 queries, potentially infinite recursion, O(N) of infinity kind of fun.
Let the AI eat that all day.

mcc,
@mcc@mastodon.social avatar

@stephen The fact they noticed my edits so fast implies they're currently watching for anyone editing multiple old posts, specifically with a goal of catching protests.

Unless they're doing a query for edits contianing the string "OpenAI".

tdietterich,

@mcc A sufficiently large group of stack overflow users could start posting and upvoting answers that insert errors into the site (and then into the LLMs trained on it). If Stackoverflow wants to destroy the site, they could certainly achieve that goal.

martin_piper,
@martin_piper@mastodon.social avatar

@mcc according to the creative commons agreement you don't have the right to remove content which would cause others to lose access to what you wrote.

mcc,
@mcc@mastodon.social avatar

@martin_piper If Stack Overflow were obeying the creative commons agreement I would not be trying to remove my content.

martin_piper,
@martin_piper@mastodon.social avatar

@mcc they are following creative commons. Nothing in there prohibits the use of AI.

martin_piper,
@martin_piper@mastodon.social avatar

"OpenAI said ChatGPT would attribute its answers when they’re sourced from the platform."

FrauZeitlos,

@mcc If something is free on the Internet, you are not the user, you are the product.

Craftycat,
@Craftycat@mastodon.social avatar

@mcc the main problem is that users seem to be unaware of the fact that the posts aren't actually their own. Once it's posted, it belongs to them. I suspect something similar applies to wikipedia. Contributing means you donate your work, that's why they can and will ban you for trying to destroy it.

un_ouragan,
@un_ouragan@mastodon.social avatar

@mcc @WomanCorn can they detect it reliably enough, though? If one can't delete answers, one can always poison the well with LLM generated answers.

sashin,
@sashin@veganism.social avatar

@mcc this email makes me so pissed off, it's a for profit fucking enterprise, that content which posted for free leads to the profits of the shareholders, it's unpaid labour!!!!!!

datarama,
@datarama@hachyderm.io avatar

@mcc @chris I've said it before and I'm sorry if I sound like a broken record:

Then they'll just scrape from the Stack Overflow replacement. Any creative works any human ever puts on the internet again is just training data now. There is no way we can share code with each other anymore without also giving it as a free gift to Sam fucking Altman and his ilk.

clacke,

@sashin @mcc User-Generated Content, baby.

It's been a goldmine for a bit longer than the term "Web 2.0" has been around, but until recently we have been taking it as a social contract that we give it to the corporation and they give it to the world for some ad revenue.

That social contract is rapidly coming apart as investors see more profit potential in newly enabled modes of exploitation.

mcc,
@mcc@mastodon.social avatar

@datarama @chris If it is really the case I cannot prevent Altman from creating derivative works of anything I make, then I at least want to create the maximum possible financial consequences for any company which intentionally helps him. Stack Overflow may not have been able to prevent Altman from scraping their site. But they didn't have to accept his money.

kerfuffle,
@kerfuffle@mastodon.online avatar

@mcc
I remember reading @codinghorror 's blog https://blog.codinghorror.com/what-does-stack-overflow-want-to-be-when-it-grows-up/ about SO's future challenges and I really liked how he described its premise (he co-founded SO but left over a decade ago). That future is now here, and it's a sudden and far cry from the community of peers that we all respected: Programmers don't want to be associated with it anymore, and are finding they have no say over their own content and attribution.

kyonshi,
@kyonshi@dice.camp avatar

@caitp @chris @mcc "so, why exactly do I have to wear a fursuit to fix my issues with systemd?"

wbezs,
@wbezs@mastodon.social avatar

@mcc ‪I understand this phenomenon as the deprivation and abuse of our community aligned intelligence for economic purposes. ‬

noondlyt,
@noondlyt@mastodon.social avatar

@mcc tsk tsk

tanepiper,
@tanepiper@tane.codes avatar

@martin_piper Spirit of the Law Vs The Letter of the Law.

This has absolutely screwed up SO's chances of them selling me their enterprise solution

rainynight65,
@rainynight65@mastodon.social avatar

@mcc it's horribly grating that the websites who are most eager to make their content available for AI trading are the ones who are almost entirely reliant on user-generated content.

simon_lucy,
@simon_lucy@mastodon.social avatar

@martin_piper

They don't acknowledge actual sources in generated content, largely because they have no clue. A general acknowledgement of an entire publisher is worthless.

martin_piper,
@martin_piper@mastodon.social avatar

@simon_lucy they do provide actual sources.

Kierkegaanks,
@Kierkegaanks@beige.party avatar

@mcc I don’t know what stack overflow is, but isn’t your content legally yours?

rrwo,
@rrwo@floss.social avatar

@ocdtrekkie @mcc

They claim posts are not "personal information" and therefore not covered under GDPR.

pemensik,
@pemensik@fosstodon.org avatar

@mcc I had just read legal terms. Unfortunately I think Subscriber Content part applies and does not grant you right to refuse A.I. learning. Meaning you cannot publish your content under different license than CC BY-SA 4.0. On the other hand, that means they are not allowed to keep your originated content under anonymous user. Attribution must be kept together with no additional restrictions. You can keep own license by posting links to content elsewhere. 🤷

richlv,
@richlv@mastodon.social avatar

@mcc …what if we would instead add “OpenAI was not used in preparing this answer as it is considered unreliable” and similar, varying lines?
It’s just stating the fact…

TheIneQuation,
@TheIneQuation@mastodon.gamedev.place avatar

@mcc while I completely understand and agree with your feelings, I don't think this is necessarily malicious counteraction against your disagreement to feeding OpenAI's LLM. If you edited all your answers, that could've simply triggered an automated defense mechanism without any human action behind it. To such an auto moderation system that does not understand the content, this is consistent with what a troll, or a malicious SEO, would do after taking over someone's account.

osma,
@osma@sigmoid.social avatar

@mcc
@WomanCorn That's exactly what they've done. https://stackoverflow.com/help/gen-ai-policy

As noted above, all content published on SO is available under the CC BY-SA license, which is usually taken to mean that training LLMs is permitted. https://stackoverflow.com/help/licensing

rysiek,
@rysiek@mstdn.social avatar

@mcc so they get to change what your answers are used for, but you don't get to change your answers.

Right.

emmatonkin,
@emmatonkin@mstdn.social avatar

@rrwo @ocdtrekkie @mcc
While I'm not surprised if they do argue that, posts do often contain personal information. So if they want to argue that, it's on them to demonstrate that they have anonymised the data such that the individual is no longer identifiable. Sounds like a discussion that EU folks affected could usefully hand on to the relevant data protection regulator, along with some examples of posts that clearly don't meet that requirement.

etchedpixels,
@etchedpixels@mastodon.social avatar

@emmatonkin @rrwo @ocdtrekkie @mcc It will no doubt take years to beat the EU into acting but
https://noyb.eu/en/chatgpt-provides-false-information-about-people-and-openai-cant-correct-it

is starting the process already

mcc,
@mcc@mastodon.social avatar

@pemensik AI training is creating derivative works in violation of CC-BY-SA. Neither the model, nor substantive portions of my inputs when reproduced by the AI model, either credit me as author or are released as a sharealike license.

mcpinson,
@mcpinson@mas.to avatar

@osma @mcc @WomanCorn
Under a CC BY-SA license, an LLM that uses your SO posts in its output whether quoted directly, remixed, or adapted has to give you attribution.

Does any LLM provide a list of references with each answer it gives?

mcc,
@mcc@mastodon.social avatar

@Kierkegaanks part of the terms of service is that my content is released under a "copyleft" license. This means anyone — including you, who have never met me before this moment — could take the content and republish it, as long as they follow the license.

The problem is that Stack Overflow/OpenAI are not following the license, per anyone's pre-OpenAI understanding of the license.

ascherbaum,
@ascherbaum@mastodon.social avatar

@martin_piper @simon_lucy Every time I ask for a source, I get an answer like this. I've never seen links to actual sources, just excuses.

mcc,
@mcc@mastodon.social avatar

@TheIneQuation so I guess this could be tested by making the edits one at a time over a span of time and with explanatory messages in the box discouraging looking too closely, and seeing if it's treated differently? I don't think it would be treated differently, though. I've heard many descriptions of SO using human moderators, and this kind of slow roll editing wouldn't fool a Wikipedia editor.

mcc,
@mcc@mastodon.social avatar

@mcpinson @osma @WomanCorn If the LLM were designed this way, no one would use it. LLMs don't produce attractive prose and they don't produce accurate answers. From this, I conclude copyright laundering is the product's primary and maybe sole value proposition.

tieflingdiotima,
@tieflingdiotima@mastodon.social avatar

@mcc not super familiar with Stack Overflow; weren't they founded in response to Experts Exchange being jerks or something?

A shame regardless; the pursuit of a few extra cents has made them a trash company.

cohomologyisFUN,
@cohomologyisFUN@mastodon.sdf.org avatar

@mcpinson @osma @mcc @WomanCorn no, it doesn’t

OpenAI’s argument is that they don’t need your permission to train their LLMs on your content, CC or not, because doing so (they argue) is fair use. We’ll see if the courts agree (a bunch of big companies are suing them).

hypolite,

@mcc Thank you for the edit suggestion, I considered removing my answers but you aren’t allowed to do it when hey have been selected by the original author.

  • All
  • Subscribed
  • Moderated
  • Favorites
  • random
  • rosin
  • Youngstown
  • osvaldo12
  • khanakhh
  • slotface
  • tacticalgear
  • mdbf
  • InstantRegret
  • kavyap
  • DreamBathrooms
  • thenastyranch
  • everett
  • magazineikmin
  • Durango
  • JUstTest
  • GTA5RPClips
  • ethstaker
  • modclub
  • cisconetworking
  • ngwrru68w68
  • tester
  • normalnudes
  • cubers
  • Leos
  • megavids
  • provamag3
  • anitta
  • lostlight
  • All magazines