/kbin server update - or how the server didn't blow up

Currently, on the main instance, people have created 40191 accounts (+214 marked as deleted). I don't know how many are active because I don't monitor it, but once again, I greet all of you here :) In recent days, the traffic on the website has been overwhelming. It's definitely too much for the basic docker-compose setup, primarily designed for development use. I was aware of the possible consequences of the situation happening on Reddit, but I assumed that most people would migrate to one of the Lemmy instances, which already has an established position. I hoped that a few stray enthusiasts would find their way to kbin ;)

The first step was to upscale the VPS to a higher version (66.91EUR). It quickly turned out that it wasn't enough. I had to enable CF protection just to keep the website responsive, but the response times were still very slow. At this stage, the instance was practically unusable. The next step was a full migration to a dedicated server (100EUR, the current hardware). It can be done relatively quickly, so it resulted in a 5-minute technical break. Despite the much higher parameters, it didn't get any better. It became clear that the problem didn't lie there. I'm really frustrated when it comes to server administration. That was the moment when I started looking for help. Or rather, it found me.

A couple days ago I wrote about how kbin qualified for the Fast Forward program. To be honest, I did it out of pure curiosity and completely forgot because a lot was happening during that time. During the biggest fire incident, Hannah ( @haubles ) reached out with a proposal to help. I outlined the situation (in short: the server is dying, I don't even know what I need, help! ;). She quickly connected us with Vlad ( @vvuksan ) and Renaud ( @renchap ). I was probably too tired because I don't know if the whole operation lasted 60 minutes or 6 hours, but after a series of precise questions and getting an understanding of the situation, the guys themselves adjusted the entire job. I love working with experts, and it's not often that you come across individuals so well-versed in the fediverse. Thanks to Hannah's kindness, we will be staying there a bit longer. Currently, fastly.com handles the caching layer and processes images. Hence those cool moving thumbnails ;)

Things were going well at that point. I could disable Cloudflare protection. Probably thanks to that, many of you are here today, and we got to know each other a bit better :) However, even then, when I tried to enable federation, the server would stop working.

Around the same time, Piotr ( @piotrsikora ), whom I already knew from the Polish fediverse, contacted me. He is the administrator of the Polish Mastodon instance pol.social, operates within the ftdl.pl foundation, and specializes in administering applications with a very similar tech stack. I made the decision to grant him server access. It only took him a few moments, and he came back to me with a few tips that allowed us to enable federation. In the following days, there was more of it, and we managed to reach the current level. I think it's not too bad.

Nevertheless, managing the instance has taken up about 60% or more of my time so far, which prevents me from fully focusing on current tasks. That's why I would like to collaborate with Piotr and hand over full care of the server to him. Piotr will also take care of the security side. Now I have to take this much more seriously. We still need to work out the terms of cooperation, but I want you to know the direction I intend to pursue.

We also need to migrate to a new environment because one server will sooner or later become insufficient. This time, I want to be prepared for it. This may be associated with transient issues with the website in the coming days.

The next two updates will still be about project funding (I still can't believe what happened) and moderation. The following ones will be more technical, with descriptions of changes and what contributors are doing on Codeberg. I would like to be here more often, but not as an admin, just as myself.

Thank you all for this.

P.S. In private messages, I also received numerous offers of help that I didn't even have a chance to read and respond to. You are the best!

djwu,

Thank you for the update

adonis,
adonis avatar

@ernest Regarding servers... did you have a look at Hetzner's server auctions. They tend to have 8c/16t servers for 40-50 bucks.

Also, I've seen kbin uses PHP at it's core. Do you consider switching to a golang stack, which is known to be more resource-friendly than PHP.

Badabinski,

Methinks that a rewrite from PHP to Go would be a pretty massive undertaking. PHP's performance characteristics have gotten a lot better as the language and various runtimes have improved, although it's not anything like Go. I think the best route would be for someone to implement another federated link aggregation system in Go, so then we'd have a diverse selection to choose from — Lemmy in Rust, kbin in PHP, this hypothetical new platform in Go, along with everything else out there. A heterogeneous system is good for the continued health of the threadiverse IMO.

stevecrox,
stevecrox avatar

That isn't the issue.

A complete rewrite of the application might add capacity, but its vertical, you stack increase load in one instance. No matter how much performance you extract eventually you run out of capacity.

As scales increase you need to add horizontal capacity. This is the idea of adding 2, 10, 100 servers. That means breaking out services into stateless parts which can run concurrently (or managed state behaviour).

This is where something like Kubernetes comes into play, since its designed to manage docker images over hubdreds of servers. Instead of using every last bit of capacity from one server you spread it.

Similarly postgres like most SQL platforms doesn't particularly scale beyond 1 instance.

Facebook invented Apache Cassandra for this reason, it was the first NoSQL database and is designed to deploy in multiples of its replicaset number (3 is the default).

Having data spread over 3, 30, 300 is less efficient, but you know have 3,30, 300 servers responding.

The other advantage is horizontal scaling is fault tolerant by design.

There is an argument for compiled languages like Go, C# and Java, but honestly the next big win is making as much as possible scale horizontally.

Pilirin,

@ernest your achievements with this piece of software is going to go down in internet history, i hope you know that. you have already written your name into the history books. you deserve congratulations and very sincere thanks.

huskola,
huskola avatar

Keep notes. This will make a great documentary some day!

I am not well versed with the way all this works so this might be a stupid question: Is there one central server in one location for this site? Could outside, remote server space be used to share the work?

!deleted110152, (edited )

deleted_by_author

  • Loading...
  • huskola,
    huskola avatar

    Could I put one in my house with a 1Gig fiber connection and let someone else run it remotely? Just curious.

    piotrsikora,
    @piotrsikora@pol.social avatar

    @digitallyfree
    Yes, and we are going to change it in a few days:)
    @haubles @vvuksan @renchap @ernest @huskola

    !deleted110152, (edited )

    deleted_by_author

  • Loading...
  • piotrsikora,
    @piotrsikora@pol.social avatar

    @digitallyfree
    now is core i9 with 64gb DDR5 and two samsung nvme (2TB each) on software raid.
    but we just order a few machines with xeon-s ... so we try make some cluster that can grow.
    @haubles @vvuksan @renchap @ernest @huskola

    EROLoLICON,
    EROLoLICON avatar

    For 100 euro? I think ernest missed a zero.
    And those few xeons machines sound expensive.

    piotrsikora,
    @piotrsikora@pol.social avatar

    @EROLoLICON
    Yeah… cost will be much higher than €100 every month ;)

    Stay tuned for updates ;)

    @haubles @vvuksan @renchap @ernest @huskola @digitallyfree

    Dantastic,

    Hello Kbin Gold Premium Unlimited
    :D

    !deleted110152, (edited )

    deleted_by_author

  • Loading...
  • piotrsikora,
    @piotrsikora@pol.social avatar

    @digitallyfree
    Yes, but this is normal CPU for desktop not for servers that why we now waiting for sever class CPU. This one is working now on almost 100% load all day and night :) kbin.social have massive traffic, but we try to keep it alive ;)

    @haubles @vvuksan @renchap @ernest @huskola

    babelspace,
    babelspace avatar

    I had a feeling that the silence over the last few days was a sign that a huge amount of work was happening behind the scenes. I also think you’re doing an excellent job of communicating with us. Thank you for all your effort that’s allowed this community to grow.

    VulcanSphere,
    VulcanSphere avatar

    Great update!

    Keep up the good work.

    Badabinski,

    Hey @Ernest and @piotrsikora,

    I haven't looked too closely at how kbin is architected yet, but would it benefit from horizontal scaling? I do full-time development of tooling to administrate very large k8s clusters for a company that you've probably interacted with today without knowing it. Not sure if k8s is the right orchestration system for you, but I'd be more than happy to provide some input on a potential migration to k8s (if kbin is a good fit there). I know there's a community on Matrix as well — I'll try to reach out there too, although it may be a bit.

    bumbly,

    I was thinking the same thing. Shouldn't this be one of the cases where k8s shines with a horizontal autoscaler? Wouldn't want to manage your own k8s though, so I imagine managed k8s is the best option. If it's the cheapest option is another question.

    @Babinski do you know if there are other horizontal autoscaling options besides k8s?

    BiggestBulb,
    BiggestBulb avatar

    I'm not familiar with the architecture of the app (nor where it's hosted), but if it happens to be on AWS then you should be able to spin up an ECS cluster (especially since it's already containerized) and load balance it that way with an ALB configured during setup. Imo that would be the fastest way to do it (again, assuming this is on AWS)

    Trebach,

    I read somewhere that it's on Hetzner.

    Badabinski,

    As @BiggestBulb said, most cloud providers have container platforms that support horizontal scaling, although generally not as elegantly as k8s (imo, others may disagree). Also totally agree about managed providers. EKS, AKS, and GKE weren't suitable for what we use k8s for (very large shared clusters) until recently, so we've been administrating our own custom k8s distro. The managed stuff has gotten a lot better, and I'd definitely recommend that for running kbin. Running k8s yourself is hard, etcd is an evil bastard. I've had plenty of chances to see what works and what doesn't in my role, however. There are some development/deployment patterns that are robust, and there are many that are not.

    VerifiablyMrWonka,
    VerifiablyMrWonka avatar

    If the post is anything to go by it's using the included "mostly for dev work only, mostly" docker-compose files. It would absolutely be able to be scaled out since at it's core it's just a webapp with workers. The app is already configured to use Redis for session storage so should be able to go super wide.

    Only limitation is how performant you could make your postgres cluster.

    ernest,
    ernest avatar

    Bullseye

    piotrsikora,
    @piotrsikora@pol.social avatar

    Hi @Badabinski

    K8S is one of option, but we decided to use some mix of bare-metal and docker swarm.

    Almost everything is prepare to grow horizontally. Only (like always) problem is in database, and also we want to have flexible software that run on big cluster and small node without changes in code.

    Give us few days, and after that we will show something ;)

    @haubles @vvuksan @renchap @ernest

    Juliie, (edited )

    It would be really helpful to have a website listing servers which are flooded and which are encouraging new members. And yeah the uptime too.

    haubles,
    @haubles@fosstodon.org avatar

    @ernest @vvuksan @renchap @piotrsikora Thank you, Ernest 💞 we're so glad we can help and support you, and the project.

    catarina,
    catarina avatar

    Looking forward to the follow-up posts with technical details, if you do find the time to write them up ofc! As a new kbin user, my thanks for all the hard work and for welcoming us here <3

    thesanewriter,

    You're doing great @earnest, I may be on Lemmy but kbin.social has become a great community and is dominating my federated feed, which tells me other people agree

    st3ph3n,
    st3ph3n avatar

    Your work is appreciated, @ernest !

    chillicampari,
    chillicampari avatar

    @ernest this was an extraordinary situation with several crisis points on a platform in early development (as far as mass usage is concerned) and you were able to keep a cool head, keep things going and also keep people informed of the situations as they were developing. I am happy to read that you have help now (and it looks like really great expert help) and that you can take a break/retirement from admin and enjoy it with the rest of us. Bravo!

    FrostBolt,
    FrostBolt avatar

    Thank you for the update

    It would be nice when you get a chance to see some sort of meter graph to show whether the buymeacoffee donations are not meeting, meeting, or exceeding costs of running the server and your time/effort spent (which should count too)

    normarcl,

    Thank you for everything you and the contributors are doing!

  • All
  • Subscribed
  • Moderated
  • Favorites
  • kbinMeta
  • khanakhh
  • magazineikmin
  • mdbf
  • GTA5RPClips
  • everett
  • rosin
  • Youngstown
  • tacticalgear
  • slotface
  • ngwrru68w68
  • kavyap
  • DreamBathrooms
  • thenastyranch
  • tester
  • JUstTest
  • ethstaker
  • cubers
  • osvaldo12
  • cisconetworking
  • Durango
  • InstantRegret
  • normalnudes
  • Leos
  • modclub
  • anitta
  • provamag3
  • megavids
  • lostlight
  • All magazines