steve,
@steve@social.technoetic.com avatar

If you were measuring performance of servers (let’s say, delivery latency under load and active user scalability, to start) how would you propose doing it? What else would you measure? Are there any existing benchmarks?

VolatileDream,
@VolatileDream@adulthood.lol avatar

@steve For delivery latency, i'd measure from the server finishing to process the request to create the item, to the last follower server receiving all the request bytes. Looking specifically for implementations that don't scale up properly with Shared Inbox Delivery, eg: with 10,000 followers on the same instance. And implementations that don't parallelize their deliveries, eg: serial deliver where one server can timeout & delay other deliveries.

nikclayton,
@nikclayton@mastodon.social avatar

@VolatileDream @steve "Time to last follower server receiving the bytes" puts you at the mercy of their availability.

time-to-first-delivery-attempt might be better, coupled with time-between-retry-attempts.

Better still, start at the other end of the problem and write user-centric SLAs focused on the experience you want users to have, and derive metrics that can capture that.

VolatileDream,
@VolatileDream@adulthood.lol avatar

@nikclayton @steve Framing it as user-centric SLA/Os ++

I interpreted the request as a protocol level test, where an implementation can be scored on how it handles another implementations unavailability or latency. With the goal being to handle it in a reliable and performant manner. I oversimplified in communicating the idea the first time. oops.

steve,
@steve@social.technoetic.com avatar

@VolatileDream @nikclayton Yes, I think both incoming delivery latency (request to post being available to view) and outgoing delivery performance could be interesting. Outgoing delivery failure handling isn't exactly performance per se, but could be important. I think some Mastodon attacks have exploited the server's inability to handle (intentionally) misbehaving peers.

nikclayton,
@nikclayton@mastodon.social avatar

@steve @VolatileDream that's why time-between-retry-attempts is important.

Say you set the goal to be "a failed delivery is retried at least every 60 minutes". That's the metric you alert on. Not e.g. size-of-outbound-queue - the outbound queue might be large, but if you're still processing it frequently enough to meet the every-60-minutes goal that doesn't matter (from a user performance perspective).

nikclayton,
@nikclayton@mastodon.social avatar

@steve @VolatileDream to put it another way, "internal" metrics like size of queue, database iops, memory use, etc, can help troubleshooting when you know there's an active problem that is impacting the quality of service users are experiencing.

But they don't tell you whether there's a user facing problem in the first place. For that you need metrics that capture the aspects of the user experience that you care about.

  • All
  • Subscribed
  • Moderated
  • Favorites
  • fediverse
  • Durango
  • DreamBathrooms
  • thenastyranch
  • magazineikmin
  • tacticalgear
  • khanakhh
  • Youngstown
  • mdbf
  • slotface
  • rosin
  • everett
  • ngwrru68w68
  • kavyap
  • InstantRegret
  • JUstTest
  • cubers
  • GTA5RPClips
  • cisconetworking
  • ethstaker
  • osvaldo12
  • modclub
  • normalnudes
  • provamag3
  • tester
  • anitta
  • Leos
  • megavids
  • lostlight
  • All magazines