@mick@cosocial.ca
@mick@cosocial.ca avatar

mick

@mick@cosocial.ca

Average middle-aged guy from various places in Ontario.

Collector of various nerdly interests.

Into #running, #books, #chess, #videogames.

At least a little bit #buddhist. Trying to lead an examined life.

I believe that we have the right to safe digital spaces where we can build communities free from corrosive capitalist influence.

Working to build a sustainable, non-corporate web as a volunteer with #CoSocialCa

Posts auto-delete after 2 months.

This profile is from a federated server and may be incomplete. Browse more on the original instance.

bynkii, to random
@bynkii@mastodon.social avatar

“I’ve been running email servers for 30 years, they’re trivial to manage for anyone with a pulse”

Fuck I hate people.

mick,
@mick@cosocial.ca avatar

@bynkii mostly those people. 😂

mick, to random
@mick@cosocial.ca avatar

For the first time the Mastodon server has started to struggle just a little bit to keep up with the flow of the Fediverse.

We’ve usually been “push” heavy but we’ve started to see some spikes in “pull” queue latency. The worst of these spikes was today, where we fell behind by at least a couple minutes for most of the afternoon.

1/?

A pair of graphs showing sidekiq queue latency and enqueued jobs. The queue latency peaked at around 6 minutes at roughly 6pm this afternoon, but was between 1-2 minutes most of the afternoon. The maximum number of messages in queue at any one time was 12000. The majority of the traffic is pull queue.

mick,
@mick@cosocial.ca avatar

This is great! It’s exciting to see our community growing.

I’m going to make a simple change to see if we can better keep up.

The system that we’re running on has plenty of headroom for more sidekiq threads.

2/?

mick,
@mick@cosocial.ca avatar

@thisismissem @polotek That would be helpful. I’d love to be better able to interpret weird traffic spikes like this.

Without being creepy about it. 😅

mick,
@mick@cosocial.ca avatar

For anyone interested in understanding the guts of Mastodon, I have found this article from Digital Ocean very helpful: https://www.digitalocean.com/community/tutorials/how-to-scale-your-mastodon-server#perfecting-sidekiq-queues

Eventually we’ll grow so big that we’ll need oodles of sidekiq queues and we’ll want to be able to customize how many types of them we want, and will run them as jobs across multiple servers and so-on.

But for now I’m just going to make the number of threads slightly bigger and see what happens.

3/?

mick,
@mick@cosocial.ca avatar

We’ll do this staging first, because I am a responsible sysadmin (and I am only ever half sure I know what I’m doing.)

We’re running the default config that came with our DigitalOcean droplet, which as a single sidekiq service running 25 threads.

4/?

mick,
@mick@cosocial.ca avatar

@thisismissem Yes I caught that. Not going near streaming. It’s about the best description of sidekiq I’ve found anywhere though.

If there are other good resources that are more up-to-date please share.

mick,
@mick@cosocial.ca avatar

That article from DigitalOcean suggests that 10-15 threads = 1 GB of RAM.

We also need to give each thread its own DB connection.

In staging the DB is local, so we don’t need to worry too much about a few extra connections.

In production, we’re connected to a DB pool that will funnel the extra connections into a smaller number of connections to the DB. Our Database server still has oodles of capacity to keep up with all of this.

5/?

mick,
@mick@cosocial.ca avatar

Staging server only has 2 GB of RAM but it also has virtually no queue activity so let’s give it a shot.

Having confirmed that we have sufficient resources to accommodate the increase and then picked a number out of hat, I’m going to increase the number of threads to 40.

6/?

A webpage showing the sidekiq control panel on cosocial.engineering, featuring 40 threads.

mick,
@mick@cosocial.ca avatar

No signs of trouble. Everything still hunky-dory in staging.

On to production.

If this is the last post you read from our server then something has gone very wrong. 😅

7/?

mick,
@mick@cosocial.ca avatar

Aaaand we’re good. 🎉

I’ll keep an eye on things over the next days and week and see if this has any measurable impact on performance one way or the other.

And that’s enough recreational server maintenance for one Friday night. 🤓

8/?

mick,
@mick@cosocial.ca avatar

This looks better! Pull queue never got more than 41 seconds behind and that was only briefly.

I still am not clear on what has contributed to these spikes, so there’s no way of knowing for sure that the changes made yesterday are sufficient to keep our queues clear and up-to-date, but this looks promising.

9/?

Graphs showing sidekiq pull queue performance over the past 48 hours. Today’s performance looks very good when compared to yesterday’s.

mick,
@mick@cosocial.ca avatar

Well, we’re not out of the woods yet.

We fell behind by less than a minute for most of the day yesterday, with some brief periods where we were slower still.

The droplet is showing no signs of stress with the increased Sidekiq threads, so I can toss a bit more hardware at the problem and see if we can reach equilibrium again.

Better would be to get a clearer picture of what’s going on here.

Maybe we need to do both of these things!

10/?

A closer view of the hours from 10 am to 5 pm (EDT) yesterday, clearly showing the rise in queued pull jobs (and a rapid clearing of push jobs)

mick,
@mick@cosocial.ca avatar

This strikes me as an issue.

We have the capacity to run 40 workers (following the change I made last week, documented earlier in this thread.)

We have fairly huge backlog of pull queue jobs.

Why aren’t we running every available worker to clear this backlog? 🤔

It might be necessary to designate some threads specifically for the pull queue in order to keep up with whatever is going on here, but I am open to suggestions.

mick,
@mick@cosocial.ca avatar

@michael that’s where I’m headed next I think.

I’d hoped that just increasing the number of threads for the single service would be enough, but it seems like the default queue prioritization results in a backlog and idle workers.

So dedicating a number of threads per queue seems like the next sensible step.

Thanks for the suggestion!

mick, to random
@mick@cosocial.ca avatar

How soon is too soon to speak ill of the dead?

jan, to mastodon
@jan@kcore.org avatar

Yeah, I'd call this a problem somewhere. open files of the mastodon user, which just runs mastodon.

mick,
@mick@cosocial.ca avatar

@jan @paul @derek @michael @haploc there’s a bug in the ES handler, it never closes sockets.

mick,
@mick@cosocial.ca avatar

@jan @paul @derek @michael @haploc there’s a patch for this, I’ll dig it up when I’m back at keyboard.

mick,
@mick@cosocial.ca avatar

@jan @paul @derek @michael @haploc https://github.com/mastodon/mastodon/pull/27138

I think most admins restart Sidekiq regularly and/or don’t connect to Elasticsearch over https so they don’t encounter this one.

We’ve been running with this patch since October.

mick,
@mick@cosocial.ca avatar

@paul @jan @derek @michael @haploc unclear to me that the asset precompilation step is necessary here, but I’m not the most accomplished Rails admin, so I’ll defer on this.

That’s the correct file.

And ya, restart sidekiq, specifically.

mick, to random
@mick@cosocial.ca avatar

This comic has popped into my mind every month or so for the past 30 years.

evan, (edited ) to random
@evan@cosocial.ca avatar

What is your relationship with your instance owner?

mick,
@mick@cosocial.ca avatar

@evan “its complicated”

mick, to random
@mick@cosocial.ca avatar

There’s this trope in zombie movies where the outbreak is just starting to take hold and the viewer catches glimpses of news reports about the emerging threat on TVs that are unnoticed in the background.

The characters don’t know that the world is ending just yet, but the viewer has a premonition of the disaster that’s just ahead.

Anyhow, feeling a bit of this about H5N1 at the moment.

mick,
@mick@cosocial.ca avatar

@n3wjack that was last week’s alarming development, ya.

Previously it was “unprecedented global outbreak in chickens,” followed by “outbreak wipes out elephant seal population.”

This week seems to be evidence of spread to… dolphins? Bad vibes!

mick, to gaming
@mick@cosocial.ca avatar

Sopwith is 40 years old! I can’t even guess at how many hours I spent playing this when I was a kid.

https://fragglet.github.io/sdl-sopwith/40years.html

  • All
  • Subscribed
  • Moderated
  • Favorites
  • JUstTest
  • thenastyranch
  • magazineikmin
  • ethstaker
  • khanakhh
  • rosin
  • Youngstown
  • everett
  • slotface
  • ngwrru68w68
  • mdbf
  • GTA5RPClips
  • kavyap
  • DreamBathrooms
  • provamag3
  • cisconetworking
  • cubers
  • Leos
  • InstantRegret
  • Durango
  • tacticalgear
  • tester
  • osvaldo12
  • normalnudes
  • anitta
  • modclub
  • megavids
  • lostlight
  • All magazines