athos77,

For years, the site had a standing policy that prevented the use of generative AI in writing or rewording any questions or answers posted. Moderators were allowed and encouraged to use AI-detection software when reviewing posts. Beginning last week, however, the company began a rapid about-face in its public policy towards AI.

I listened to an episode of The Daily on AI, and the stuff they fed into to engines included the entire Internet. They literally ran out of things to feed it. That's why YouTube created their auto-generated subtitles - literally, so that they would have more material to feed into their LLMs. I fully expect reddit to be bought out/merged within the next six months or so. They are desperate for more material to feed the machine. Everything is going to end up going to an LLM somewhere.

DoctorButts,

Like Homer Simpson eating all the food at the buffet

TubeTalkerX,

Or when he went to Hell

massive_bereavement,
massive_bereavement avatar

Because the issue with statistically-based LLMs is that they lack precision, so they tend to "hallucinate" or rather let's say, provide inaccurate or completely baloney responses.

Despite it being part of the nature of how these things are designed, the companies involved keep trying to tell everyone that's a problem of size: "when we've got enough data, the results will be precise enough", however that's the good ol' fake it till you make it tactic.

elgordio,

I think auto generated subtitles were to fulfil a FCC requirement, some years ago, for content subtitling. It has however turned out super useful for LLM feeding.

Tehdastehdas,
@Tehdastehdas@lemmy.world avatar
stoly,

There really isn’t much in the way of detection. It’s a big problem in schools and universities and the plagiarism detectors can’t sense AI.

kubica,
kubica avatar

I'm going to run out of sites at this pace.

herrcaptain,

Right? It seems like the modern internet is made up of like 5 monolithic sites, and unlimited SEO spam.

I know that’s not literally true, but it sure feels like it.

FaceDeer,
@FaceDeer@fedia.io avatar

Fortunately the AIs are getting quite good at answering technical questions like these.

Daerun,

Good to know that stackoverflow will not be a trustable place to find solutuons anymore.

FJW,
@FJW@discuss.tchncs.de avatar

Frankly, the solution here isn’t vandalism, it’s setting up a competing side and copying the content over. The license of stackoverflow makes that explicitly legal. Anything else is just playing around and hoping that a company acts against its own interests, which has rarely ever worked before.

HelloHotel,
@HelloHotel@lemmy.world avatar

The license of stackoverflow makes that explicitly legal

How and why is it illegal (I will take down my post about vandlism until I discuss this.)

FJW,
@FJW@discuss.tchncs.de avatar

I’m not saying vandalism is illegal. I’m say that it borders on immoral and that there is a better, more radical (and thus effective) alternative that one might expect to be illegal but in fact isn’t.

HelloHotel,
@HelloHotel@lemmy.world avatar

My post was mostly to just insert invisable marks like   to your answers to screw over any machine that is sensitive to unicode.

pseudo,
@pseudo@jlai.lu avatar

Angry users claim they are enabled to delete their own content from the site through the “right to forget,” a common name for a legal right most effectively codified into law through the EU’s General Data Protection Regulation (GDPR). Among other things, the act protects the ability of the consumer to delete their own data from a website, and to have data about them removed upon request. However, Stack Overflow’s Terms of Service contains a clause carving out Stack Overflow’s irrevocable ownership of all content subscribers provide to the site

It reality irritates me when ToS simply state they will do against the law.

hikaru755,

It’s not quite that simple, though. GDPR is only concerned with personally identifiable information. Answers and comments on SO rarely contain that kind of information as long as you delete the username on them, so it’s not technically against GDPR if you keep the contents.

windpunch,

You could argue that people can be identified by their writing style. I have no idea how far you’d get with that though.

FJW,
@FJW@discuss.tchncs.de avatar

Frankly I don’t see any way whatsoever that this would fly, and that’s a good thing!

Imagine what it would mean for software-development if one angry dev could request the deletion of all their contributions at a moments notice by pointing to a right to be forgotten. Documentation is really not meaningfully different from that.

merthyr1831,

If i was stack overflow I would’ve transferred my backups to OpenAI weeks before the announcement for this very reason.

This is also assuming the LLMs weren’t already fed with scraped SO data years ago.

It’s a small act of rebellion but SO already has your data and they’ll do whatever they want with it, including mine.

trailee,

There’s also the possibility of adding to the wonderful irony of making the AI more useful than the original by having content that’s no longer accessible through through the original. It doesn’t get more enshittified than that, even if Prashanth Chandrasekar is too out of touch to ever regret his decision.

Muffi,

I think you’re 100% correct in assuming they’ve already fed it data scraped from SO. I’ve previously gotten code samples from ChatGPT that was clearly from SO down to the comments in the code. Even reverse searched some of the code and found the question it was from.

mint_tamas,

OpenAI clearly already scraped the pre-LLM (aka actually useful) content from SO, this entire deal is happening after the fact to avoid litigation.

trailee,

It’s true that it’s mostly a symbolic act, but the rebellion matters, especially from old accounts. It’s also a nice way to mark the time after which I never participated in SO again. After my ban expires, I’ll deface my questions again. And again. Until they permaban me.

trailee,

They seem to only be watching the questions right now. You’re automatically prevented from deleting an accepted answer, but if you answered your own question (maybe because SO was useless for certain niche questions a decade ago so you kept digging and found your own solution), you can unaccept your answer first and then delete it.

I got a 30 day ban for “defacing” a few of my 10+ year old questions after moderators promptly reverted the edits. But they seem to have missed where I unaccepted and deleted my answers, even as they hang out in an undeletable state (showing up red for me and hidden for others).

And comments, which are a key part to properly understanding a lot of almost-correct answers, don’t seem to be afforded revision history or to have deletes noticed by moderators.

So it seems like you can still delete a bunch of your content, just not the questions. Do with that what you will.

gravitas_deficiency,

lol wow this is going even more poorly than I thought it would, and I thought my kneejerk reaction to the initial announcement was quite pessimistic.

Churbleyimyam,

At the end of the day, this is just yet another example of how capitalism is an extractive system. Unprotected resources are used not for the benefit of all but to increase and entrench the imbalance of assets. This is why they are so keen on DRM and copyright and why they destroy the environment and social cohesion. The thing is, people want to help each other; not for profit but because we have a natural and healthy imperative to do the most good.

There is a difference between giving someone a present and then them giving it to another person, and giving someone a present and then them selling it. One is kind and helpful and the other is disgusting and produces inequality.

If you’re gonna use something for free then make the product of it free too.

An idea for the fediverse and beyond: maybe we should be setting up instances with copyleft licences for all content posted to them. I actually don’t mind if you wanna use my comments to make an LLM. It could be useful. But give me (and all the other people who contributed to it) the LLM for free, like we gave it to you. And let us use it for our benefit, not just yours.

jnk,

Agreed on that last part, making that the default would be a great solution. I could also use a signature in comments, like that guy who always puts the “Commercial AI thingy” but automatically.

CancerMancer,

An idea for the fediverse and beyond: maybe we should be setting up instances with copyleft licences for all content posted to them. I actually don’t mind if you wanna use my comments to make an LLM. It could be useful. But give me (and all the other people who contributed to it) the LLM for free, like we gave it to you. And let us use it for our benefit, not just yours.

This seems like a very fair and reasonable way to deal with the issue.

madis,

Well, supposedly people can use it without paying and without account, though I cannot confirm the last part in the official site.

HelloHotel,
@HelloHotel@lemmy.world avatar

Open access != Copyleft, but its a decent start.

Churbleyimyam,

Can you explain?

HelloHotel,
@HelloHotel@lemmy.world avatar

Copyleft lisenses are anti-copywrite, copywrite lisenses. They guarantee any random person the right to use and (usually) modify and (usually) distribute the work (art, program, etc.) with some noteworthy terms and conditions. Open access is where they provide a good or service for free but are not legally required to do so.

I bitch about it not being open sourced like llama2.

Churbleyimyam,

I think you still have to have an account (last time I used it anyway), but you’re right, there is a tier you don’t have to pay any money for. It’s just an email address but whatever. You can use it via their website but afaik they haven’t released a free model based on the data they’ve scraped off us, so you can’t host it on your own hardware and properly do what you want with it. I have heard though that commercial websites were/are using ChatGPT bots for customer service and you can easily use the customer service chatbots on their website to do other random stuff like writing bash scripts or making yo mama jokes.

bitwolf,

Rather than delete, modify the question so its wrong. Then the ai will hallucinate.

Sabata11792,
Sabata11792 avatar

I just expect to insult the user while not answering the question.

Zink,

As a large language model, I expect you to use the search function. Asshole.

jnk,

Have you tried to read the fucking manual you filthy lazy fuck? Marked as solved. Is there anything else i can do to help you? 😊

Sabata11792,
Sabata11792 avatar

Perfect. I can't tell a difference.

KeenFlame,

I don’t understand what anyone wins from this

Corporations are foundationally evil

And how do they not win more if we poison the entire Internet?

It’s like being in a toxic relationship with kids involved

Set boundaries

Follow rules

Don’t destroy the fucking fruit of your bodies just because you are angry at each other

Fuck those guys, like a lot, for taking your given data and selling

And fuck open ai for trying to make money from scientific discoveries meant for all of humanity

But what the fuck with ruining the entire Internet?

Who gets anything then?

If language models will ruin Internet why be afraid that normal human responses are available? Wut?

steventrouble,

You’re right, but I guess others haven’t yet learned this wisdom. Wisdom cannot be taught, only learned. 🤷

MataVatnik,
@MataVatnik@lemmy.world avatar

Maybe a better act of rebellion would be to scrape the data on stack, self host it, and move to an open source platform. Easy for me to say though, when I only ever coded Hello World

nasduia,

Why does OpenAI want 10 year old answers about using jQuery whenever anyone posts a JavaScript question, followed by aggressive policing of what is and isn’t acceptable to re-ask as technology moves on?

nialv7,

They probably aren’t looking for the factual information, perhaps more the logical thinking abilities.

btaf45,

jQuery is still an excellent Javascript library

jj4211,

Nice try, ChatGPT

btaf45,

jQuery will be still be around after the latest Javascript framework of the month is long gone.

jj4211,

Maybe, but I wouldn’t say it’s really excellent.

It was basically helping people deal with ancient browsers (particularly IE6) and a javascript runtime bereft of convenience features, at a cost of some syntactic awkwardness and performance.

If you are targeting ES2020 and above, as is widely considered a reasonable requirement, you pretty much have the stuff that jQuery brings to the table, but built in without additional download and without an abstraction that costs some cycles.

Snapz,

You can’t quit, you’re fired!!!

sugar_in_your_tea,

Cool, now I can go collect unemployment. :)

Snapz,

You can’t, you’re hired!

stevedidwhat_infosec,

Instead of solely deleting content, what if authors had instead moved their content/answers to something self-owned? Can SO even claim ownership legally of the content on their site? Seems iffy in my own, ignorant take.

matjoeman,

They can. It’s in the TOS when you make your account. They own everything you post to the site.

stevedidwhat_infosec,

Well I suppose in that case, protesting via removal is fine IMO. I think the constructive, next-step would be to create a site where you, the user, own what you post. Does Reddit claim ownership over posts? I wonder what lemmy’s “policies” are and if this would be a good grounds (here) to start building something better than what SO was doing.

Aux,

A SO alternative cannot exist if a user who posted an answer owns it. That defeats the purpose of sharing your knowledge and answering questions as it would mean the person asking the question cannot use your answer.

stevedidwhat_infosec,

A SO alternative cannot exist if a user who posted an answer owns it. That defeats the purpose of sharing your knowledge and answering questions as it would mean the person asking the question cannot use your answer.

Couldn’t these owners dictate how their creations are used? If you don’t own it, you don’t even get a say.

Aux,

That’s the point of platforms like SO - you give away your knowledge, for free, for everyone, for any use case. If a user can restrict the use of their answers, then it makes no sense for SO to exist. It’s like donating food to a food bank and saying that your food should only go to white people and not black people.

stevedidwhat_infosec,

I’m not sure I agree with your example - it’s more like giving the owners of the donation the ability to choose WHO they are donating to. That means choosing not to donate to companies that might take your food donation and sell it as damaged goods for example. I wouldn’t want my donation to be used that way. Thats how I see it anyway

JackbyDev,

Everything you submit to StackOverflow is licensed under either MIT or CC depending on when you submitted it.

stevedidwhat_infosec,

So does that mean anyone is allowed to use said content for whatever purposes they’d like? That’d include AI stuff too I think? Interesting twist there, hadn’t thought about it like this yet. Essentially posters would be agreeing to share that data/info publically. No different than someone learning how to code from looking at examples made by their professors or someone else doing the teaching/talking I suppose. Hmm.

repungnant_canary,

CC (not sure about MIT) virtually always requires attribution, but as GitHub Copilot showed right now open-“media” authors have basically no way of enforcing their rights.

Dkarma,

Probably cuz they gave them away when they open licensed…you know…how it’s supposed to work

repungnant_canary,

In most jurisdictions you can’t give away copyright - that’s why CC0 exists. And again most open-source and CC licences require attribution, if you use those licences you have a right to be attributed

JackbyDev,

For super permissible licenses like MIT then it’s probably fine. Maybe folks would need to list the training data and all the licenses (since a common requirement of many of even the most permissible licenses is to include a copy of the license).

As far as I know, a court hasn’t ruled on whether clauses like “share alike” or “copy left” (think CC BY-SA or GPL) would require anything special or not allow models. Anyone saying otherwise is just making a best guess. My best guess is (pessimistically) that it won’t do any good because things produced by a machine cannot be copyrighted. But I haven’t done much of a deep dive. I got really interested in the differences between many software licenses a few years back and did some reading but I’m far from an expert.

bitwolf,

So they have to carefully only source the MIT data?

JackbyDev,

It hasn’t been tested in court so any answer anyone gives is only a best guess.

lauha,

Regardless of the license (apart perhaps from public domain) it is legally still your copyright, since you produced the content. Pretty sure in EU they cannot prevent you from deleting your content.

JackbyDev,

But those two licenses give everyone an irrevocable right to do certain things with your content forever and displaying it on a website is one of those things (assuming they follow the other requirements of the license).

pseudo,
@pseudo@jlai.lu avatar

If StackOverflow teach me something, that is that legal jargon about copyright isn’t very efficient again ctrl+C/ctrl+V

FJW,
@FJW@discuss.tchncs.de avatar

it is legally still your copyright, since you produced the content. Pretty sure in EU they cannot prevent you from deleting your content.

They absolutely can, you gave them an explicit (under most circumstances irrevocable) permission to do so. That’s how contracts work.

lauha,

Unlike in US, and I cannot speak for all of EU, but at least in Finland a contract cannot take away your legal rights.

FJW,
@FJW@discuss.tchncs.de avatar

You can when it comes to copyright. That’s EU-law and anything else would be such a horrible idea that no country would ever set up a law saying otherwise.

If you could simply revoke copyright licenses you would completely kill any practicality of selling your copyrighted works and it would fully undermine any purpose it served in the first place.

old_machine_breaking_apart,

Maybe we need a technical questions and answers siteon the fediverse!

kalpol,

Not gonna stop your knowledge being fed to an AI.

ultra,

what about instances that need you to be logged in to view posts and require authorized requests for federation?

kalpol,

All it needs is an account to access troves of training data?

ultra,

That should be manually approved

Saledovil,

How restrictive do you want to be with the accounts? If you’re too restrictive, there won’t be enough users. If you’re not restrictive enough, the data will be used for AI training.

Aux,

That defeats the purpose of a knowledge base. The whole reason why everyone is using SO is that you don’t need an account to access it and it’s fully indexed by Google.

The real question is why the fuck are people ok with Google indexing SO and not OpenAI? Doesn’t make any fucking sense.

irreticent,

The real question is why the fuck are people ok with Google indexing SO and not OpenAI? Doesn’t make any fucking sense.

Because Google is free and OpenAI isn’t. It’s one thing to take free content, index it, then allow anyone to access that index. It’s another thing when you take free content, index it, then hide that index behind a paywall.

Aux,

Are you sure? Because Google is not free at all, you’re paying for it through privacy invasion and ads. While ChatGPT is actually free to use for end users - no ads, nothing.

jnk,

The price difference is that google steals your data. That’s it. OpenAI steals data, ask for money to use most of their models, and buy even more data from other companies stealing user data (like google and SO). Also indexing web pages is not even the “stealing” part of google, it’s just not comparable.

Yes, training AI on user data for free then selling the end product is a reasonable thing to be concerned about. It’d be different if the product was free or the data was sold to them with user consent.

SO has announced a subscription-based service trained on user data for free, and not only there’s not even opt-out, they’re mass-banning users for trying to “opt-out” manually. Tell me one thing here that’s not completely fucked up.

Aux,

But it’s free. Unlike Google.

irreticent,
Aux,

No, it’s free chatgpt.com

As your link is for custom enterprise solutions, it’s worth noting that Google has the same shit which also costs money cloud.google.com/pricing/

Zacryon,

It’s “freemium”, not free. There is a difference. You can’t use ChatGPT 4 without paying as well as the API. Also, you are limited in the number of prompts you can make per hour before you are put on pause and asked to pay.

Search engines like Ecosia, DuckDuckGo, etc. don’t ask you for money. Regardless how intensively you use it. (They might come with other drawbacks though like Google with privacy, environment, ethical principles, …)

Aux,

It;s more free than Google.

Zacryon,

I’ve never been asked to pay for using one of the aforementioned search engines. I have been asked to pay for OpenAI products.

So I don’t see how you come to that conclusion.

Aux,

Read the comments

Zacryon,

The ones where you just claim that despite it being not true or which ones do you mean?

Aux,

Not true? Ahaha! Good job spreading misinformation!

Zacryon,

Well… as I said. OpenAI asks for money, search engines usually don’t. Ergo, OpenAI is not free. (But freemium.)

Despite claiming that’s not the case, you lack the necessary proof and don’t seem to care about countering my argument with something of substance.

Such a discussion will not be fruitful if you are unwilling to deliver.

Aux,

It’s free, what else do you want?

Zacryon,

That you deliver reasons for why you claim I’m wrong.

It’s freemium, not free. As I said before, OpenAI limits the number of prompts you can make per hour in case you don’t want to pay. Also, using the API or ChatGPT 4 costs money. Users of search engines are usually not asked for money.

irreticent,

What does Google’s cloud service have to do with what we’re discussing (Google indexing content vs. SO OpenAI doing it)? They’re not even similar services.

Edit: SO -> OpenAI

Aux,

The fuck are you talking about?

old_machine_breaking_apart,

Is there an actual way to stop it? I don’t think so. At least, moving to the fediverse would stop any particular corporation from having the monopoly of it, prevent reddit-like abuse of power, would give users more power, among a few other things.

bamfic,

Nothing stopping them from scraping that too

  • All
  • Subscribed
  • Moderated
  • Favorites
  • technology@lemmy.world
  • GTA5RPClips
  • DreamBathrooms
  • InstantRegret
  • magazineikmin
  • thenastyranch
  • ngwrru68w68
  • Youngstown
  • everett
  • slotface
  • rosin
  • ethstaker
  • Durango
  • kavyap
  • cubers
  • provamag3
  • modclub
  • mdbf
  • khanakhh
  • vwfavf
  • osvaldo12
  • cisconetworking
  • tester
  • Leos
  • tacticalgear
  • anitta
  • normalnudes
  • megavids
  • JUstTest
  • All magazines