maegul,
@maegul@hachyderm.io avatar

Graphs of the sizes of fediverse instances, how common they are, and where the most people are! 🧵

Data pulled from https://instances.social/ (by @TheKinrar) and excludes pawoo and baraag as they're heavily blocked for good reasons (it seems)

Breaking down instances by the number of users into bins (that are quasi human friendly logarithmic), we see that the majority (55%) have 2-50 users, ~33% have 1 user, and almost all instances have less than 5,000 users.

@fediversenews

1/

maegul,
@maegul@hachyderm.io avatar

@fediversenews

Though most instances are very small (in user count), the large instances are very large by comparison. The result is that the 20-30 largest instances host around half of all the users of the fediverse.

This graph is a cumulative percentage of all users starting with the largest instance and descending. By 20th largest, we've got 50% of all users. Mastodon.social hosts ~16%! The top 10 get you ~40%. Note that this includes 2 large japanese instances (mstdn.jp+mastodon.cloud)
2/

maegul,
@maegul@hachyderm.io avatar

@fediversenews

Which for me leads to the question of how many people are on instances of what size (eg, what percentage of all users are on instances with 10-20K users?)

Well, turns out it's pretty even (using the bins from above), from 1K to 1M users, with 10% users in 1K-5K instances and ~13% in 50K-100K instances. Only below 1K user instances do you get a substantial drop off in the number of users on such instances.

Take away for me, plenty of people on 1K to 40K instances!

3/

gorfram,

@maegul @fediversenews The dip between 40K & 50K on the % users vs. instance size chart is interesting to me. My takeaway is that many/most >50K instance users came to Mastodon, found an easy-to-find instance, and have seem no reason to change instances.
While >40K instance users found the “mega-instances” unwieldy, too much like corporate SM, &/or unresponsive to legit concerns; but still wanted the SM experience of a well populated instance.

*based on no specific knowledge or expertise

maegul,
@maegul@hachyderm.io avatar

@gorfram @fediversenews Maybe ... most likely just a kink in the data and the binning I chose.

gorfram,

@maegul @fediversenews Entirely likely. Reading too much into a set of numbers is one of my failings/talents.

gpollara,

@maegul @fediversenews my real concern, which you cannot address from these analyses, is the interoperability & discoverability across these instances. I dont think I fully understand it, but I'm not sure it's 2 way across all instances (even without blocking). i.e. are all instances truly connected and discoverable to each other?

maegul,
@maegul@hachyderm.io avatar

@gpollara @fediversenews Yea that's a totally different question (the federation network graph between instances).

As for discoverability, that's a general issue on the fediverse. Like, is anything really discoverable here? Hashtags help, and then you follow people and the people they follow/boost. But overall, I'm not sure any instances are more/less discoverable to/from each other than the others ... unless local timelines are being used (which I don't, so 🤷 ).

trisweb,
@trisweb@m.trisweb.com avatar

@maegul @fediversenews Genuinely cool that the biggest instance hosts "only" 16% of users. I would have thought higher. Good sign.

maegul,
@maegul@hachyderm.io avatar

@fediversenews

Taking the data from above, we can make a cumulative percentage graph (line chart) over the same bins as above.

We see that the halfway mark is ~50K users. So half of the fediverse are on instances with 50K or more users, half on instances with less.

Slightly more technically, this line is pretty straight (as users are roughly evenly spread out, highlighted above). Given that the bins are roughly logarithmic-ish, this hints that the distribution is a power law.

end/

maegul,
@maegul@hachyderm.io avatar

@fediversenews

Simple extra without a graph, on the power law thing. A log-log histogram by user count does show a very linear relationship. Graph not included because it'd probably be confusing and I don't know anything about logarithmic binning.

All I'll say is that on the log-log plot there was small but clear bulge in the mid-sized instance range (1K-100K users), which may represent a certain "sweet spot" of instance size that people are attracted to ??

gpollara,

@maegul @fediversenews lol, I was about to ask about the log - log plot, and as you say, that shows the bulge / exception where the greater observed > expected are.

Interesting data. Question for me is how user experience matches to size of instance. I. E. Where is the most enjoyable / rewarding home in the Fediverse? 🤷‍♂️

maegul,
@maegul@hachyderm.io avatar

@gpollara @fediversenews

The secret might be that people vary in what they want and are looking for, thus the distribution we've got where any user is as likely to be on a large instance as they are to be on a small instance.

gpollara,

@maegul @fediversenews yes, i think that's right. In that respect, interesting that Fediverse may be disproportionately attracting people with niche communities / interests in the middle of distribution, though I may be over-speculating!

maegul,
@maegul@hachyderm.io avatar

@gpollara @fediversenews I don't think so, there are probably very real reasons why people would end up on a 100-1K user instance rather than a 50 user instance, and for similar reasons, prefer a 50K instance over a 200K instance.

Homebrewandhacking,

@maegul @fediversenews

Thankyou for the fascinating thread on sizes of server. 50% of Mastodon being on centralised servers is a thing alright.

redegelde,
@redegelde@mastodon.education avatar

@Homebrewandhacking @maegul @fediversenews
and for the big instances. how active are the users? If i look at my server stats is see that have the third active one

maegul,
@maegul@hachyderm.io avatar

@redegelde @Homebrewandhacking @fediversenews

Yes that’s also relevant. I’m unclear on the data I got in activity and how it compared to that in fedidb. So I didn’t do any analysis on that. Maybe a bother time or someone else would be keen.

Homebrewandhacking,

@maegul @redegelde @fediversenews

What are those activity numbers? Peak users? Median, modal, mean average?

Either way. Shockingly low to see 4 digits there.

redegelde,
@redegelde@mastodon.education avatar

@maegul @Homebrewandhacking @fediversenews
is the data i see from my admin site, can not point the vinger on it look on other servers. En what active means

Lapineige,
@Lapineige@mamot.fr avatar

@maegul can I suggest you to make the same analysis but for monthly user accounts ?
Total accounts is interesting but doesn't represent that much how "real" accounts are spread.
From what I estimated here, https://mamot.fr/@Lapineige/110254339887773340, MAU are not spread evenly accros different instance sizes. The biggest ones have a lower relative active user count which can gives a pretty different result.
(I forgot to remove pawoo and baraag)

I did a quick&dirty graph in this discussion : https://mamot.fr/@Lapineige/110254602914029771

Lapineige,
@Lapineige@mamot.fr avatar

@maegul if my numbers are roughly correct, the final picture is a little less worse than with total accounts, with the 10 biggest hosting "only" 25% of monthly active users (and 40% of all accounts according to your findings) and a few hundreds hosting the vast majority.

Note: You could take half a year active users but I don't think it's really relevant and it biased because of late 2022 migrations that generate a bit spike of "activity" for 6 months long, even if people never connected again.

maegul,
@maegul@hachyderm.io avatar

@Lapineige

Good work! You’re right, MAU are important. I’ve come to the same conclusion as you by using a sub sample of data from fedidb.org. That is only smaller instances have higher active user rates.

Do you know the best way to get active user numbers? I’m unclear on the technical details and not sure I want to write a crawler.

Lapineige,
@Lapineige@mamot.fr avatar

@maegul from my test and the link I shared above, yes very small instances have high active user ratio (that's expected, they invest time and or money to have their own instance), but in bigger ones there is also a range that is more active, around 10,000 and 100,000 active accounts.
Note that I made the ranges out of MAU, not total accounts.

gorfram,

@maegul @fediversenews When I was instance-shopping upon people being asked to leave mastodon.social (my first instance) during the most frenzied part of the Twitter exodus, I chose 3K-30k users as my sweet spot. Less than 30K because I wanted to be able to go knock the moderators’ virtual door down if they proved unresponsive to things in need of moderation; & more than 3K because my experience with bulletin boards, etc, was they could be claustrophobic or moribund at >1K users or so.

maegul,
@maegul@hachyderm.io avatar

@gorfram @fediversenews

Yep, I take your point and I think you are right! See, eg, my extra post here: https://hachyderm.io/@maegul/110331536984068521 where I comment on how the data might indicate that people are indeed gravitating to instances in the size range you highlight.

So maybe the 40K-50K bin is just a kink, but yea, maybe you're totally right, and it's a valley between two kinds of users/instances.

Cities and their populations probably demonstrate a similar pattern??

gorfram,

@maegul @fediversenews Do you know if there’s data like that for cities? There’s gotta be, somewhere.

I wonder if that statistical valley was where Yogi Berra was standing when he reportedly said, “No one goes to Coney Island anymore: it’s too crowded.”

maegul,
@maegul@hachyderm.io avatar

@gorfram @fediversenews

Don't know exactly, but it is a sub-field of science to some extent. Geoffrey West is a former physicist known for studying such things ... look him up and you might find some resources.

His general finding was that power laws (which seems to accurately describe the relationship between the number of instances and their size) pop up all over the place in biology and cities.

joeventures,

@maegul @gorfram @fediversenews I believe they refer to it as a Zipf distribution.

marjolica,

@maegul @gorfram @fediversenews most of us are sticky and don't change servers unless we get our choice completely wrong.
Some servers are also open to growing and others are not - scaling moderation is an issue here for some
So some users in a server of a particular size were in a smaller server when they first joined so what we see now may not fully reflect initial user choice.

NicholasLaney,

@maegul
Hi, thank you for the analysis.
The users count you are referring to is active users, or total users?

maegul,
@maegul@hachyderm.io avatar

@NicholasLaney these are total users only. Data on active users was unclear in the source I was using. But it should be revisited for active users for sure!

NicholasLaney,

@maegul
Thank you. :)

NicholasLaney,

@maegul @fediversenews
Hi, thank you for the analysis.
The users count you are referring to is active users, or total users?

blackburied,

@maegul @fediversenews You'd think users should self-load-balance and join instances that are not heavily loaded?

BenjaminNelan,
@BenjaminNelan@mastodon.social avatar

@blackburied @maegul @fediversenews possible solution: fediverse plinko. migrate everybody into a random instances to balance things out. 😀

luca,

@maegul @fediversenews

isn’t counter.social fully defederated ?

alfredo_liberal,
maegul,
@maegul@hachyderm.io avatar

@luca @fediversenews Huh ... don't know.

How would one find out? Is there a single or good source of truth on these things?

It's numbers aren't large enough to change the shape of the data though.

More broadly though, I have no idea how federated any of the instances in the dataset are (apart from pawoo and baraag, which are very large which is why they were excluded).

pevohr,

@maegul @luca @fediversenews For any given instance, you can ask two different questions:

federation ... How many other instances has this one ever connected with?

defederation ... Who's currently blocking whom?

To the extent that apps + instances on the Fediverse support the relevant Mastodon APIs (many apparently do), it's possible to get decent answers to both.

pevohr,

@maegul @luca @fediversenews To get a rough sense of how widely federated a particular instance is, check out the connections column in instances.social's advanced view:

counter.social 0
gc2.jp 0
pravda.me 31

This sloppy metric counts the number of unique domains ever seen from that instance, so it tends to overcount current connectivity. (Also, some Masto admins use tootctl to manually prune spam domains after forkbomb attacks, but others don't.)

pevohr,

@maegul @luca @fediversenews As for blocks, see the top 50 lists here.

https://fba.ryona.agency/

For all the dramatic talk about defederation, it seems like most instances don't actually do that to each other. (However, I haven't been able to assess how complete the coverage of this dataset is, so YMMV.)

objectinspace,

@pevohr @maegul @luca @fediversenews source code: git.kiwifarms.net/mint/fedi-block-api

Might not wanna use, or link to that tool ma dude.

objectinspace,

@pevohr @maegul @luca @fedriversenews Possible yes; however I've seen a lot of communities go to a shields up defcon2 situation pretty fast when you start building tools to really dig into who is blocking who and why. I also haven't seen much to compare the degree to which an instance is federated. Though in fairness, I have not really looked!

alesssia,

@maegul @fediversenews are these active users?

My belief is that most of the users in very large instances are those that created an account to try out the Fediverse/Mastodon but then abandoned it, while smaller instances have a much higher retention rate -- so not strictly true that half the users are in the top 20 instances, but you know, no data to prove it 😬

ZoDoneRightNow,

@maegul @fediversenews Does the Pareto principle apply here? are 20% of instances home to 80% of users?

spaetz,

@maegul @fediversenews Zipf's law at play. Who would have thought :-) https://en.m.wikipedia.org/wiki/Zipf%27s_law

JorgeStolfi,
@JorgeStolfi@mas.to avatar

@spaetz @maegul @fediversenews

Awful article.

Where did the laws with s other than 1 come from?

JorgeStolfi,
@JorgeStolfi@mas.to avatar

@spaetz @maegul @fediversenews

In case you care, I did a general revision on that article. But there are still plenty of holes and rough spots...

paoloredaelli,
@paoloredaelli@mastodon.uno avatar

@maegul
The Pareto principle in all its glory: "roughly 80% of consequences come from 20% of causes" https://en.m.wikipedia.org/wiki/Pareto_principle 😀
@fediversenews @aral

erpuchi,

@maegul
So the normal is to have negligible communities
@TheKinrar @fediversenews

kkarhan,

@maegul @TheKinrar @fediversenews that makes the :fediverse: more decentralized than even 25 years ago.

I hope this will remain the same...

enmodo,
@enmodo@mastodon.social avatar

@maegul @TheKinrar @fediversenews The graph x axis labels are confusing. I'm assuming 0 to 2 is actually 0-1 and the next bin is 2-49?

22,

@maegul ooh, it'd be cool to see the y-axis on a log scale also to better see the tail!

  • All
  • Subscribed
  • Moderated
  • Favorites
  • fediversenews@venera.social
  • DreamBathrooms
  • InstantRegret
  • ethstaker
  • magazineikmin
  • osvaldo12
  • rosin
  • mdbf
  • Youngstown
  • khanakhh
  • slotface
  • Durango
  • kavyap
  • ngwrru68w68
  • thenastyranch
  • provamag3
  • tacticalgear
  • cisconetworking
  • GTA5RPClips
  • modclub
  • cubers
  • normalnudes
  • everett
  • tester
  • megavids
  • Leos
  • anitta
  • JUstTest
  • lostlight
  • All magazines