@wimpy True story, I submitted a project to https://nlnet.nl/ whose goal is to fix this and provide a service that I can use in @pidgin 3 and they rejected it without asking any questions.
Haven't gotten to the point of needed it yet for Pidgin 3 but that day keeps getting closer and closer.
That said I've been brainstorming with a friend who did a bunch of work on the openembed stuff in synapse, the python based matrix homeserver which as you would have guessed as a similar issue.
They should talk to @troyhunt and how to correctly use caches
"Yep, we just hit "five nines" of cache hit ratio on Pwned Passwords being 99.999%. ..., let's talk about how we've managed to only have two requests in a million hit the origin..."
@wimpy I wrote about this problem two years ago and there has been no movement toward a fix or even mitigation from the Mastodon developers. https://jwz.org/b/yj6w
BTW, everyone who knee-jerk replies to this with "LOL get a CDN" is saying: "I expect all web sites to be run by dedicated professionals, so that my social network can be run by amateur hobbyists".
This is no different than the slashdot effect. Your content got posted to slashdot and the resulting traffic surge could cripple your webserver for a while because suddenly lots of people were hitting your site. You just trade the slashdot post for a boost and add what should be a lightweight call to generate the link preview in place of a direct visit.
Take it as a badge of honor... enough people are interested in what you have to say that you need to treat your site more seriously, or suffer downtime because of your popularity. Because ultimately, if mastodon generating link previews can cripple your site, so can real traffic. And someone who's angry with you could easily take your site down intentionally.
@wimpy I agree this should be fixed, but what a horribly-written article this is. You have to wade through multiple paragraph blocks just to get to the main technical issue.
@wimpy It also does not speak well of their technical acumen if they don't realise that.
Can you imagine for a moment (and yes, I perfectly understand how unlikely it is on a non-federated social network) if they'd put out an article like that whinging about (pre-space karen) twitter ?
The people at @itsfoss are really nice and friendly. They are a small team, publishing a lot of articles, and only have limited financial resources.
I am not a web professional, but I guess there should be html/css feature that allows to serve a static prepared image and teaser text for a website, instead of every instance having to pull the entire site (with all the rubbish on it) and having to generate its own preview. Or is there not?
There are multiple methods they could use on their side to mitigate the issue there, including some things akin to what you suggest. Instead of complaining about, and blaming others for it.
That is what makes it rather churlish of them to publish an article, on their ostensibly technical news outlet, blaming others for the problem when it is in line with and entirely a consequence of the normal functioning of the internet.
I'm sure every site can do better (and we will too, with all the constructive suggestions we've been receiving).
But, the whole point of the article was to shed light on an issue that's existing for almost 6 years now, and has been pushed back.
If you think, our caching handling techniques are poor, as a technically inclined user, you should also realize that the issue highlighted for Mastodon is a fundamental one as well?
There is definitely more to the problem than just an unfortunate web server configuration. The underlying issue is also partly caused by the distributed structure of the Fediverse (I tried to shed dome light on the issue in an itsfoss community post).
However: Whilst it is true that the issue has already been reported years ago, it seems minor enough (links to websites get shared all the time), to not have found its way to the top of the agenda.
@wimpy I see it as an issue on both sides. A #cdn should handle huge spikes in traffic. I would look at more details of how the CDN is being leveraged. There are some settings to tweak to improve performance.
On the other side, #Mastodon should fix this issue. It's sounds like a bad design smell. Each instance should cache the link preview at the minimum. It shouldn't hit the source for per follower. #twocents
@rockmanjoe Mastodon does cache link previews, so one instance generates preview once—it's not per post and not per user. Which still doesn't help much given the amount of instances out there, but the alternative isn't nice either: you don't want to trust a random instance to generate a preview for the entire fediverse to propagate.
@wimpy The replies to this are obnoxious. As @jwz has pointed out, this is a real problem, but lots of people are calling this person names, saying he should fix his site, to work around a wasteful Mastodon inefficiency. It's a bug and it needs to be fixed; we shouldn't be generating tens of thousands of requests to make the link preview. It could be generated once and the image could be shared, or it could be cached in some other way.
There is a workaround, sort of. When linking to someone's site, if an image is included in the post a link preview image won't be generated.
@not2b if you generate it once, you will then have one instance generate something obscene and then that'll propagate across the fediverse. See bluesky for example where preview is generated and attached to the post on the client which allows to create fake links and embeds. This is not something you'd want.
Add comment