@FaceDeer@fedia.io
@FaceDeer@fedia.io avatar

FaceDeer

@FaceDeer@fedia.io

Basically a deer with a human face. Despite probably being some sort of magical nature spirit, his interests are primarily in technology and politics and science fiction.

Spent many years on Reddit and then some time on kbin.social.

This profile is from a federated server and may be incomplete. Browse more on the original instance.

FaceDeer,
@FaceDeer@fedia.io avatar

Keep making it more expensive to suck on their toes and perhaps eventually they'll stop.

FaceDeer,
@FaceDeer@fedia.io avatar

And also they're posting about it on a completely open platform that any AI trainer could trivially be "harvesting" as well.

FaceDeer,
@FaceDeer@fedia.io avatar

Anyone can download a torrent containing historical Reddit comments, Reddit surely has at least that if not a full edit/delete history of all the comments. The only people you are thwarting by deleting your comments are other humans who may stumble across your old threads in Google.

FaceDeer,
@FaceDeer@fedia.io avatar

Well that seems unlikely, what are the odds that an airplane is going to be chased by a ship?

FaceDeer,
@FaceDeer@fedia.io avatar

The analogy isn't perfect, no analogy ever is.

In this case the content of the search is all that really matters for the quality of the search. What else would you suggest be recorded, the words-per-minute typing speed, the font size? If they want to improve the search system they need to know how it's working, and that involves recording the searches.

It's anonymized and you can opt out. Go ahead and opt out. There'll still be enough telemetry for them to do their work.

FaceDeer,
@FaceDeer@fedia.io avatar

But then you get that awkward situation where you go on vacation, open your luggage to get a fresh pair of socks or whatever, and find that you brought nothing but guns and ammo along with you on your trip.

FaceDeer,
@FaceDeer@fedia.io avatar

I just did a bit of poking around on the subject of the "right to be forgotten" and it's legally complex. Data without personally identifying information, and data that's been anonymized through statistical analysis (which LLM training is a form of) aren't covered.

FaceDeer,
@FaceDeer@fedia.io avatar

Surely the use of user-deleted content as training data carries the same liabilities as reinstating it on the live site?

Why would that be? It's not the same.

And what liabilities would there be for reinstating it on the live site, for that matter? Have there been any lawsuits?

FaceDeer,
@FaceDeer@fedia.io avatar

You don't think LLMs are being trained off of this content too? Nobody needs to bother "announcing a deal" for it, it's being freely broadcast.

FaceDeer,
@FaceDeer@fedia.io avatar

Have to save it up in jars ahead of time.

FaceDeer,
@FaceDeer@fedia.io avatar

The echo-chamberiness of Lemmy is different from Reddit, but still a thing unfortunately. It'll really depend on the community you're in, but since the population of the Fediverse (and especially the Threadiverse) is very small compared to Reddit you tend to have the same people cropping up a lot. I haven't been banned from anywhere (that I know of - I don't actually know if I would get notified) but I find myself hammered with downvotes more frequently here than on Reddit when I say something unpopular.

I'd say, mess around a bit and see.

FaceDeer,
@FaceDeer@fedia.io avatar

The only way I can imagine this working is by twisting the definition of the words "search engine" enough that you can claim that there aren't search engines, but really there are still, just under a different name.

Search engines aren't actually the "problem" that OP is wanting to address, here, though. He just doesn't like the specific search engines that actually exist right now. What he should really be asking is how a search engine could be implemented that doesn't have the particular flaws that he's bothered by.

FaceDeer,
@FaceDeer@fedia.io avatar

Existing AIs such as ChatGPT were trained in part on that data so obviously they've got ways to make it work. They filtered out some stuff, for example - the "glitch tokens" such as solidgoldmagikarp were evidence of that.

FaceDeer,
@FaceDeer@fedia.io avatar

You think they don't have the originals archived?

FaceDeer,
@FaceDeer@fedia.io avatar

"Model collapse" can be easily avoided by keeping old human data with new synthetic data in the training set. The old archives of Reddit content from before there was AI are still around.

FaceDeer,
@FaceDeer@fedia.io avatar

There are torrents of complete Reddit comment archives available for any random person who wants them, I'm sure Reddit themselves has a comprehensive edit history of everything.

FaceDeer,
@FaceDeer@fedia.io avatar

By "old archives" I mean everything from 2022 and earlier.

FaceDeer,
@FaceDeer@fedia.io avatar

And even if SLS is an example of non-private rocketry, it's hardly something that should be touted as a positive example. Especially not when launch pace is your criterion.

FaceDeer,
@FaceDeer@fedia.io avatar

Some are, but they still don't build rockets. I think there's some other factor that's important.

FaceDeer,
@FaceDeer@fedia.io avatar

With comments like this he likely goes through new accounts on a very rapid pace.

FaceDeer, (edited )
@FaceDeer@fedia.io avatar

It is impossible for them to contain more than just random fragments, the models are too small for it to be compressed enough to fit. Even the fragments that have been found are not exact, the AI is "lossy" and hallucinates.

The examples that have been found are examples of overfitting, a flaw in training where the same data gets fed into the training process hundreds or thousands of time over. This is something that modern AI training goes to great lengths to avoid.

FaceDeer,
@FaceDeer@fedia.io avatar

You could say it's to "circumvent" the law or you could say it's to comply with the law. As long as the PII is gone what's the problem?

FaceDeer,
@FaceDeer@fedia.io avatar

Also pretty sure training LLMs after someone opts out is illegal?

Why? There have been a couple of lawsuits launched in various jurisdictions claiming LLM training is copyright violation but IMO they're pretty weak and none of them have reached a conclusion. The "opting" status of the writer doesn't seem relevant if copyright doesn't apply in the first place.

FaceDeer,
@FaceDeer@fedia.io avatar

Nor is it up to you. But fact remains, it's not illegal until there are actually laws against it. The court cases that might determine whether current laws are against it are still ongoing.

  • All
  • Subscribed
  • Moderated
  • Favorites
  • JUstTest
  • kavyap
  • DreamBathrooms
  • thenastyranch
  • magazineikmin
  • tacticalgear
  • cubers
  • Youngstown
  • mdbf
  • slotface
  • rosin
  • osvaldo12
  • ngwrru68w68
  • GTA5RPClips
  • provamag3
  • InstantRegret
  • everett
  • Durango
  • cisconetworking
  • khanakhh
  • ethstaker
  • tester
  • anitta
  • Leos
  • normalnudes
  • modclub
  • megavids
  • lostlight
  • All magazines