Announcing OpenLemmyStats.org: Publicly Queryable Vote History + Other Hidden Data for Any Lemmy User!

What’s stopping me from doing this? Here we go:

I’m going to start an instance and federate with everyone who will allow it, which is most instances including this one, I believe.

Then I’m going to feed all that data into my new website, called Open Lemmy Stats, where anyone can query the user data ive accumulated. The homepage will be ripe with insights, leaderboards and all kinds of data on prolific users.

Additionally, I’ll display a snapshot/profile of a random user by feeding that users data to GPT4 to make inferences about the user’s political affiliations and display the results.

Worst of all, I’m not going to out my instance for everyone to know it as the one to defederate. In fact, I’m spinning up a few instances that will host innocuous communities that I plan to mod and support to give my instances cover for their true purpose: redundant fediverse datastreams for my site, Open Lemmy Stats.

I’ll also have a store where anyone can buy my collected fediverse data for a handsome sum.

Just kidding I’m not doing any of this. But someone absolutely will or already is working on it. They’ll make a good bit of money too, I’d bet.

This is inspired by a recent post on youshouldknow@lemmy.world where someone highlighted what kind of data instance admins have access to, even for users not on their instance.

I wanted to share this to start a discussion that I find interesting. I’m interested in your thoughts, or to hear more on why this may or may not be possible and if it is, maybe some ideas how to fix that? because obviously such a site would be problematic, but no doubt popular for oh so many reasons.

Edit: typo, I called admins adminis. Corrected.

Edit 2: wanted to credit the post I was referencing from YSK, here it is - lemmy.world/post/1033769

FaceDeer,
FaceDeer avatar

Just kidding I’m not doing any of this.

Aw, I was looking forward to seeing my profile and having you save me the trouble of compiling some of that data for myself.

I’m not going to out my instance for everyone to know it as the one to defederate.

Should be fairly straightforward to figure out, if I was interested. I'd create an instance of my own and have it present slightly different information to each of the other instances that federate with me, probably creating a different fictional user to send a few votes from to each of them. Then just check to see which of those fictional users shows up in your data and your data-collection instance's identity is revealed.

booty_flexx, (edited )

Good idea! I think as a solution I would run multiple instances and double, triple, or quadruple-verify the data from multiple instances that i run to make sure no one is feeding me fake data. If there are discrepancies I could average the data, or flag the value(s) with a confidence rating and fuzz the numbers to be safe

If an instance fakes too much data and doesn’t match what other instances are reporting I’ll quietly defederate or stay federated but program my system to ignore data from that instance as not to tip anyone off

FaceDeer,
FaceDeer avatar

I guess it'll become a standard feature for every default installation of Lemmy or Kbin to create a random "trap user" (analogous to the fake "trap streets" in maps used to detect whether someone copied them) for each federated partner, then. You'll have no idea which ones are actually paying attention to who's harvesting their data, just that everyone potentially could be.

Personally, as I said, I have no particular qualm with a service like this existing. I'd find it handy and if you really think that you'll be able to sell the data it collects I expect a dozen competitors would spin up immediately to soak up whatever profit potential it had. But I think the advantage lies with those who are trying to spot your "watcher" instances, they're going to have to federate and subscribe with everyone so they'll be pretty prominent in the Fediverse.

I think you'd do much better if you dropped the "muahaha, I'm so evil!" Act and just provided the service. There are plenty of Reddit analogues and nobody cares about them.

gloriousspearfish,

I was thinking yay that sounds like an awesome data visualization platform, that would be great. Until I got to the “just kidding” part.

You are right, all this information is readily available. And we would be really naive if we think that no one is collecting this yet.

You, or someone else, should build this, such that it is clearly visible for everyone what data is available. And not just visible to the select few who builds their own closed data mining systems.

RightHandOfIkaros,

This would be a pretty bad idea. Not only are companies going to steal all the data from that site, but its going to lead to people going through every user’s history to block people who don’t have the same “color politics” as them. Its going to lead to hyper echo chambers, even worse than other social platforms.

I think it would be better if this data is obfusicated even from instance admins. Does this present a bigger challenge in identifying malicious users? Probably, yes. However, it protects the Fediverse first and foremost from the vampire companies stealing consumer data, and protects the Fediverse from becoming the loudest echo chamber on the planet.

Differing opinions, viewpoints, and politics are important to genuine discussion. These “color politics” don’t have to even be part of the discussion to influence what people say. I don’t know about you, but being in a thread where people only ever agree with me and offer no alternative ideas is not a place I want to spend a lot of time in. Because who knows, maybe my ideas are wrong, and I might (shudder), change my mind.

gloriousspearfish,

Hey, I completely agree with you, in that the most interesting discussions are among groups where I don’t agree with everyone. This is where I learn and grow as a person.

But in saying that, aren’t you also saying that some people, like you and me, would not use such a database to filter out the users we do not agree with?

And would it not be a logical conclusion to make, that people who likes to build and stay in their echo chambers, would not be more inclined to listen to different opinions just because they don’t have a more efficient tool to sort out people they disagree with?

What I am saying is, all information that is technically available will be collected and analysed. Better make a public and open platform showing everything, such that everyone can see exactly what can be collected and surmised from the already public information, than to keep users blind from what information they actually leak publically.

static,
static avatar

Useless fearmongering if focussed on Lemmy only.

This could be done to any twitter, mastodon or reddit user

Horst_Voller,

Could any random Reddit user see what I up or down voted?

static,
static avatar

So after getting a an AI analysis of all your comments, them selling your data. You only worry about upvotes?
That's a small difference. This example takes a small difference and blows it up to the extreme.

Horst_Voller,

Well yeah. First of all that's another data point that marketing and AI companies can utilise.

Secondly on Reddit only admins could see how you voted and it was communicated that that is the case. In a federated network anyone who is or can pose as an instance can get that data. The Problem is that people assume Lemmy and kBin are more or less just like Reddit and are not considering everyone can get their voting record they assumed was not public. Better not have liked that kinky adult content or that politically controversial post.

rideranton,
rideranton avatar

kbin users:

Look at what they need to mimic a fraction of our power

/s

quinten,
@quinten@lemmy.world avatar

You got me at the first half, not gonna lie.

Gnugit,

Nice try Zuckerberg

booty_flexx,

I just want a place for everyone to smoke some meats, yknow, real people stuff

zinklog,

Was with you until the money point. It’s extremely easy to get this data and there will be many open source versions doing this thing.

But I agree that who upvoted a post shouldn’t be federated.

booty_flexx,

I totally get what you’re saying.

I think there is (unfortunately) value to be mined from packaging the data conveniently, or offering a subscription service to make it trivial to query for anyone without sysadmin or database skills. Or just throw porn ads on it or some shady ad network that doesn’t mind being placed on questionable sites.

zalack, (edited )
zalack avatar

I really think Lemmy, Kbin, and Mastodon need to figure out a way to have a default terms of service that ship with their product which forbids using the API to collect data for commercial purposes.

Additionally, there should be a way for users to indicate licensing for individual posts, with a default license instance admins can set.

That way for-profit instances could be forced to filter out posts with licenses that do not allow for-profit use. Honestly, even just a simple check mark "[ ] allow for-profit republication", and have two licenses that can be attached: one that allows for-profit use and one that does not.

FaceDeer,
FaceDeer avatar

Whoever's doing this wouldn't be using Lemmy, Kbin, or Mastodon code. They'd likely write up some custom ActivityPub service that listened in on that protocol. ActivityPub is an open protocol so trying to put some kind of "no profit" restriction on it at this point would be impossible, and having it on there from the start would have killed its adoption.

Lemmy, Kbin, and Mastodon are all currently licensed under the GPL so good luck trying to retroactively put that genie back in the bottle too. The GPL allows for-profit companies to run the code with no further restrictions.

Europe's got the GDPR, if you really want to try some kind of legal route to counter this, but I don't think it's very likely to work well.

OnionFutures,

But I agree that who upvoted a post shouldn’t be federated.

This also surprised me. I wonder is it necessary for technical reasons to prevent repeated upvoting of a submission by the same user?

sabreW4K3,

Capitalists gonna try and capitalise. I’ve seen lots of people try and create services like this for mastodon.

Great post BTW.

booty_flexx,

Thank you, I appreciate that!

That’s interesting about mastodon, I’m not exactly surprised, I feel like it’s merely a question of when, not if, apparently that time has already passed for mastodon. I have no doubt folks are already capitalizing or attempting to capitalize on lemmy data in some way or another, or at least letting the data fill their bucket while they figure out how to monetize it.

  • All
  • Subscribed
  • Moderated
  • Favorites
  • fediverse@lemmy.world
  • DreamBathrooms
  • mdbf
  • osvaldo12
  • magazineikmin
  • GTA5RPClips
  • rosin
  • thenastyranch
  • Youngstown
  • cubers
  • slotface
  • khanakhh
  • kavyap
  • InstantRegret
  • Durango
  • JUstTest
  • everett
  • ethstaker
  • cisconetworking
  • Leos
  • provamag3
  • modclub
  • ngwrru68w68
  • tacticalgear
  • tester
  • megavids
  • normalnudes
  • anitta
  • lostlight
  • All magazines