wyri,
@wyri@haxim.us avatar

First part of a new long term home project coming in. An PoE+ switch to power a small cluster built using nodes. Going to blog about every step once it has been completed. But it is going to be a few quarters long project doing bit by bit

wyri,
@wyri@haxim.us avatar

spaced out with weeks/months in between. Rather excited, but lots of questions so will probably poke with lots of questions 😎.

wyri,
@wyri@haxim.us avatar
wyri,
@wyri@haxim.us avatar

Ow noes what am I getting myself into!

wyri,
@wyri@haxim.us avatar
wyri,
@wyri@haxim.us avatar

Hah, the initial master node has arrived! Still going to take a few weeks before the PoE+ hats will arrive, plus still need to arrange storage that doesn't nuke itself like SDcards by heavy usage. Will start prototyping and designing an initial enclosure

wyri,
@wyri@haxim.us avatar
wyri,
@wyri@haxim.us avatar

Probably the most boring part of the cluster arrived for the first node, SATA to USB for the SSD connector:

wyri,
@wyri@haxim.us avatar

Looking at ' and this looks like a good fit for my cluster 😱. The storage is also worth considering.

wyri,
@wyri@haxim.us avatar

Wohooo the for the first node just arrived! Will build a casing around it and the pie tomorrow.

wyri,
@wyri@haxim.us avatar

The only thing that's left are the PoE+ hats, which should arrive next week.

wyri,
@wyri@haxim.us avatar

Ow and pro-tip don't send orders from your webshop with what's in on the packaging/envelope

wyri,
@wyri@haxim.us avatar

The holder was very straightforward and even let's me lock it in thigther than the RPI

wyri,
@wyri@haxim.us avatar

Adding the RPI on top wasn't much of a hassle either. But the tension on the cable is something I would rather avoid

This image is part of an import from Twitter involving about 150 tweets. My apologise for this one not having it. Updates to the thread will have a description on each image.

wyri,
@wyri@haxim.us avatar

Preferably I'll have them in a ninety degree angle from each other. That should also help with keeping them cool. But in order to do that I need Technic and I don't know if sells build your own sets like they do with classic lego

wyri,
@wyri@haxim.us avatar

Reference shot with the switch. Building a temporary spot on the board for the master node tomorrow

wyri,
@wyri@haxim.us avatar
wyri,
@wyri@haxim.us avatar

Which is fine until I realised I also needed to put the storage closer due to cable length. So I moved it a bit closer and also adjusted the storage to a bit more forward:

This image is part of an import from Twitter involving about 150 tweets. My apologise for this one not having it. Updates to the thread will have a description on each image.

wyri,
@wyri@haxim.us avatar
wyri,
@wyri@haxim.us avatar
wyri,
@wyri@haxim.us avatar

It arrived, and isn't this the tiniest network cable you've ever seen?!

wyri,
@wyri@haxim.us avatar
wyri,
@wyri@haxim.us avatar

It turned on! Except it didn't like the image on the SD card (probably)

wyri,
@wyri@haxim.us avatar

Well doh!

wyri,
@wyri@haxim.us avatar

For those interested, the set up to see the error (this is Ubuntu 21.04 server, which is either stuck or very slowly booting)

wyri,
@wyri@haxim.us avatar

amd64 image is looking promising!

wyri,
@wyri@haxim.us avatar

🎉🎉🎉🎉!

wyri,
@wyri@haxim.us avatar

Kubernetes home lab using lego thread imported from Bird site

wyri,
@wyri@haxim.us avatar

And barely using power! (Ok ok it's idle and not using the SSD yet.)

wyri,
@wyri@haxim.us avatar

Ow and of course the power usage is graphed:

wyri, (edited )
@wyri@haxim.us avatar

Alright! Take over cluster from @ocramius just came in from Germany. This should speed up the project, plus makes it makes slightly repurposing those nodes in the future easier. (Yes there is a plan beyond jus the cluster and it's housing.)

wyri,
@wyri@haxim.us avatar
wyri,
@wyri@haxim.us avatar

And exactly this is why I want to use Technic for the skeleton. (Moved it from the box it was on the yellow thing it's on right now. The grey underplate is now fully resting on the surface it's on.))

wyri,
@wyri@haxim.us avatar

Temporary set up, next up is setting up Thinkerbell

wyri,
@wyri@haxim.us avatar
wyri,
@wyri@haxim.us avatar

😱😱😱😱

wyri,
@wyri@haxim.us avatar

Ok been digger a bit deeper, is 64bit hardware required? Given the missing symbol naming:

wyri,
@wyri@haxim.us avatar

Hah success! Had to do the cert steps on a 64 bit machine tho

wyri,
@wyri@haxim.us avatar
wyri,
@wyri@haxim.us avatar

Figured out what's wrong today and what is causing this error, TL;DR I need a custom kernel/bootloader to be able to run workflows, to be continued:

wyri,
@wyri@haxim.us avatar

Decided to skip auto provisioning for now. Not because of but due to the fact that the RPI4 set up I've been following requires a custom kernel and initramfs to be able to run workflows from netboot: https://t.co/VejBOwwTP0

wyri,
@wyri@haxim.us avatar

Still figuring out how to create that kernel and initramfs and that's going to take a while. And I really wanted to have at least SOMETHING running. So this is my MVP home cluster running . Didn't bother with the SSD yet, running purely from SDcards for now.

wyri,
@wyri@haxim.us avatar

Got something else to fix for booting from SSD, but will also fix that later.

For now, the next step is getting GitHub Action runners on it to start building applications for it and have a way to deploy directly to it. There are several solutions for that, should be fun :D

wyri,
@wyri@haxim.us avatar

If anything, I learned that arm and arm64 support for many Helm charts/Docker images out there aren't as good as I hoped for.

This is partially why I'm doing this project, aside from having some use cases in the house

wyri,
@wyri@haxim.us avatar

This literally sums up my day: https://t.co/szbGzcHxiu

wyri,
@wyri@haxim.us avatar

Essentially lots of Docker images only have an amd64 version, maybe an arm64 but rarely an arm(7) image so running anything on the RPI3's in the cluster is unlikely unless I start building images for it.

Now the GitHub Actions Runner Helm chart I'm using also only has amd64 and

wyri,
@wyri@haxim.us avatar

arm64 versions.

So that is going to be fun. The cool thing is setting that up is really really easy, like scary easy. Writing a Helm chart to add all the of the runner deployment and autoscaling definitions for that. Also considering putting them directly in a projects

wyri,
@wyri@haxim.us avatar

deployment. But that results in the chicken and egg problem, so either the first deployment to the cluster has to be done manually or I'll have to store them at a central location.

However the first thing on the menu is getting Helm to work and be able to deploy from within the

wyri,
@wyri@haxim.us avatar

cluster using a GitHub Actions Runner.

When that works, I'm locking all network and permissions within the cluster and the network as much as possible.

wyri,
@wyri@haxim.us avatar

Securing the cluster is high on the last. Today was RBAC for deployments, tomorrow it's locking down the network: https://t.co/ED4GeOwmRB

wyri,
@wyri@haxim.us avatar

Locking down the network didn't fully go as planned as I kept fully isolating the cluster from accessing anything in the network. This sucks if you want to host HTTP based services for inside your own network 🤣. Today I did something different however

wyri,
@wyri@haxim.us avatar
wyri,
@wyri@haxim.us avatar
wyri,
@wyri@haxim.us avatar
wyri,
@wyri@haxim.us avatar
wyri,
@wyri@haxim.us avatar

It currently also has a "swing" mode as I'm still learning all the parts I need for this:

wyri,
@wyri@haxim.us avatar
wyri,
@wyri@haxim.us avatar

It's in its temporary position, let's turn it on!

wyri,
@wyri@haxim.us avatar

It's alive! Well only the body, there is no SDcard in it yet as I need to first update the network provisioning for this VLAN. But now I have a node to experiment with that on while the master node does all the being a k8s alone (:P)

This image is part of an import from Twitter involving about 150 tweets. My apologise for this one not having it. Updates to the thread will have a description on each image.
This image is part of an import from Twitter involving about 150 tweets. My apologise for this one not having it. Updates to the thread will have a description on each image.

wyri,
@wyri@haxim.us avatar

If there is anything I've learned so far is that building a solid enclosure for the node and SSD, is that it is harder than it looks. Partially because I don't have all the required parts because I'm ordering as I'm learning. So I have a couple 100 parts already, but everytime

wyri,
@wyri@haxim.us avatar

I order it's mainly parts I didn't have enough of the previous time. So it's a slow process, but it means that even though I'm ordering way more than I need per node. It also means I have a whole bunch of parts I need anyway to build the enclosures for the other nodes.

wyri,
@wyri@haxim.us avatar

And sometimes you get interruptions like this cute little kitten:

wyri,
@wyri@haxim.us avatar

Dark shot of the cluster in progress. This thing will become a light show when done 🤣

wyri,
@wyri@haxim.us avatar

Argh, one step forward two steps backward. I really love the idea of and everything call that you can do with it. But I haven't even gotten around getting the workflows to run and do their job. (It's also not that is to blame here for the record!)

wyri,
@wyri@haxim.us avatar

It's that the RPI's make you jump through all kinds of hooks with PXE and net booting it. I'm probably better off building my own image that streams the k3sos iso to the SSD and kexec's into that or something.

wyri,
@wyri@haxim.us avatar

Because all I want is a fresh node when it comes up, no reuse of whatever was previously on that node. It's maybe not what you'd normally do for a "home lab" but I'd like it because there is no litter left behind.

wyri,
@wyri@haxim.us avatar

So my afternoon on this project started pretty well, with booting from SDcard. Next step was booting it from the SSD. Should be easy right?

This image is part of an import from Twitter involving about 150 tweets. My apologise for this one not having it. Updates to the thread will have a description on each image.

wyri,
@wyri@haxim.us avatar
wyri,
@wyri@haxim.us avatar

That p in sdap1 shouldn't be there when using an SSD over USB, but it has to be there for when doing this from an SDcard. The script I'm using has this somewhat hardcoded, and it took me long enough to release that the "fix" pointed out in this issue: https://t.co/RWujfwXkFF

wyri,
@wyri@haxim.us avatar

Solves it, and makes the whole thing boot and works without a hitch. Next up is making sure I'm using the latest version, as for some reason the script doesn't pick up the latest version as provided. (Or I can just let it upgrade it self until the latest version

This image is part of an import from Twitter involving about 150 tweets. My apologise for this one not having it. Updates to the thread will have a description on each image.

wyri,
@wyri@haxim.us avatar

Had to disable a few features, but it's up and running!

wyri,
@wyri@haxim.us avatar

There is nothing running on it yet obviously, but it is up and running:

wyri,
@wyri@haxim.us avatar

That also makes that there are now two Kubernetes clusters up and running in our house

wyri,
@wyri@haxim.us avatar

One of the things I wanted SSD's for, is A) SDcards wear out fast under high I/O, B) speed, but C) https://t.co/0nq5EhdEMv for persistent volumes. (With a S3 based backup/restore for real persistence.)

wyri,
@wyri@haxim.us avatar

One of the things I want to try now knowing how that script works. Is to hardcode sda in it, and boot from SDcard when SSD doesn't have an MBR. Now when booting from SDcard it will install k3os on the SSD, and up on reboot k3os supports scripts and I'm looking into removing the

wyri,
@wyri@haxim.us avatar

MBR after it has booted from the SSD. So that the next time it is powered on again it will reinstall just as if it's a fresh node.

wyri,
@wyri@haxim.us avatar

This cluster will be a beacon of light in the darkness 🤣

wyri,
@wyri@haxim.us avatar

And combined both nodes into a single new cluster. With nothing on it yet, but will apply in the morning loading some of the basics on it:

wyri,
@wyri@haxim.us avatar

And yes, a bare / cluster looks really boring :D

wyri,
@wyri@haxim.us avatar
wyri,
@wyri@haxim.us avatar

Installed https://t.co/0nq5EgW3UX just now (through terraform through a GitHub Actions self-hosted runner on the cluster (yes it's a bit meta)). And due to the amount of pods (24!!!!), it took the cluster a while to download all OCI images, extract them, and spin the pods up

This image is part of an import from Twitter involving about 150 tweets. My apologise for this one not having it. Updates to the thread will have a description on each image.

wyri,
@wyri@haxim.us avatar

using default settings (so 3 replica's for most of the things).

wyri,
@wyri@haxim.us avatar

Alright so with the latest and firmware the + fans are kicking in. The downside, they are audible when they ramp up to cool. Which happens every few 1 - 20 seconds pretty much. Need to tweak that they are pinning 10 RPM higher by default, I think

wyri,
@wyri@haxim.us avatar

Ow the yellow/green lines are the fans, and the blue/orange is the CPU temp on the nodes

wyri,
@wyri@haxim.us avatar

3rd node is incoming soon.

wyri,
@wyri@haxim.us avatar

What to name the third node (the theme is infinity stones):

wyri,
@wyri@haxim.us avatar

Since the previous poll resulted in a tie, let's have round two a.k.a. the finals (the theme is still infinity stones):

wyri,
@wyri@haxim.us avatar

The hardware for (what looks like) Reality is in 🎉

wyri,
@wyri@haxim.us avatar

Waiting for a Pick a Brick order with "some" parts for the nodes housing. One of the major lessons from the last few days was that the USB <-> SATA adapter blinking during the night can affect our sleep. And I prefer a good night rest, so will attempt to build a less

wyri,
@wyri@haxim.us avatar

light leaking housing for it. And those parts were missing. (Also hoarding for future and current nodes once I've settled on a design.)

wyri,
@wyri@haxim.us avatar

This is what 300 pieces look like. Let the building being!

wyri,
@wyri@haxim.us avatar
wyri,
@wyri@haxim.us avatar
wyri,
@wyri@haxim.us avatar
wyri,
@wyri@haxim.us avatar
wyri,
@wyri@haxim.us avatar

Overview shot with the previous iterations:

wyri,
@wyri@haxim.us avatar

Aside from the need to block the SSD USB <-> SATA adapter's LED, another thing is sound from the PoE+ hat's cooling fan. It's switching between 64 and 128 RPM a lot to cool the CPU off by a few degrees celcious. Rather noise, especially if you can hear it in the bedroom at night.

wyri,
@wyri@haxim.us avatar

So one of the things I took time for today was to tweak when to switching from 64 to 128 RPM. And it's currently set to 55 degrees celsius. Meaning it gets about 5 degrees hotter then when it would previously kick in, and instead of every few seconds it now only kicks in a

wyri,
@wyri@haxim.us avatar

handful times an hour.

On the other side of that coin, I don't want it to get hot at all because it's still held together by . Which can held a maximum of 80 degrees celsius according to: https://t.co/bP2FxZX8hx

wyri,
@wyri@haxim.us avatar

But judging by the mention of Polycarbonate for transparent bricks could be interesting as well, at least for the contact points with the Pi. runs off to the pick a brick page

wyri,
@wyri@haxim.us avatar

Ow, another neat detail is that before letting this node join the cluster. I only had to turn it on once, to get the MAC address of the board. It booted straight from USB after that 😍

wyri,
@wyri@haxim.us avatar
wyri,
@wyri@haxim.us avatar

Ow before I forgot, here is the article by that taught me how to tweak the PoE+ fans: https://t.co/IorNUhLt0F

wyri,
@wyri@haxim.us avatar

So far, so good (Also installed LongHorn again since the cluster now has 3 nodes. (Hence the spike in temperature in the middle.)):

wyri,
@wyri@haxim.us avatar

After this wipe of the cluster was done, and all nodes were back up. It took 15 minutes for to reprovision every service running on the cluster: https://t.co/cfnhZvXfAo

wyri,
@wyri@haxim.us avatar

And the fan speed tweak really worked out. It gets a tad hotter, but no more annoying spin ups of the cooling fan all the time anymore:

wyri,
@wyri@haxim.us avatar

Ok, another big milestone reached, deploy a project to both my clusters at the same time. This one builds a VPN between both clusters:

wyri,
@wyri@haxim.us avatar

Ok, this might look like exactly the same thing as the previous tweet. But this time for my current cluster crafted the kubeconfig that was used to do the deployment. Refs: https://t.co/CYBqnPqQFb

wyri,
@wyri@haxim.us avatar

Also terraforming my current cluster will secure it more, plus make a lift and shift, or booting up a clone in another region a lot simpler. (Application specifics excluded.)

wyri,
@wyri@haxim.us avatar

Another few hours of , can now back up to (self hosted on my NAS):

wyri,
@wyri@haxim.us avatar

Did the boring thing today and added as ingress, showing my default backend here using a global ingress:

wyri,
@wyri@haxim.us avatar

Was hoping to use 's Gloo instead of , but no arm64 images available by default made it, for now, an easy call go with . (Nothing against FYI, just want to experiment with Gloo more on this cluster.)

wyri,
@wyri@haxim.us avatar

Another important milestone today. Started preparing to move an existing project over to use on the cluster instead of running on my NAS. Went all in and set up a 3 node cluster. Up next is configuring the ingress for AMQP

wyri,
@wyri@haxim.us avatar

Been iterating over that thing, and have been duping from traffic from the one running on my NAS to see how it works with more than one node: https://t.co/mDtpYwdG6n

wyri,
@wyri@haxim.us avatar

And so far has been invaluable for (insights about) the persistent storage for each pod:

wyri,
@wyri@haxim.us avatar

And one of the cats somehow managed to race through the cluster and somehow take out the master node. (It's still running but all network is down.)

wyri,
@wyri@haxim.us avatar

Found the cause of the master outage. Deathwing somehow managed to disconnect the SATA to USB adapter from the SSD used for storage so the node didn't come back up after powercycling it.

wyri,
@wyri@haxim.us avatar

It's noticeably catching up with the rest of the cluster.

wyri,
@wyri@haxim.us avatar

Here is something interesting I didn't notice yet. The spikes you see are the other two nodes frantically attempting to reconnect with the master node:

wyri,
@wyri@haxim.us avatar

Had a chat with today after I caught him attempting to be a chaoscat again:

wyri,
@wyri@haxim.us avatar

My lego order with special parts for a, hopefully, more stable construction also finally shipped yesterday. So early next week more experiments are starting :D!

wyri,
@wyri@haxim.us avatar

Right, so the lego parts haven't arrived yet. But the next 3 PoE+ hats for the next 3 nodes did!

wyri,
@wyri@haxim.us avatar

Another thing that came in are two USB extension cables. Mainly got those to experiment with less tension on the USB <=> SATA cable

wyri,
@wyri@haxim.us avatar

Yay! The parts for experimenting just came in!

wyri,
@wyri@haxim.us avatar

And the big frames to the right might be come a key part in making node blades:

wyri,
@wyri@haxim.us avatar

Even though this first attempt failed, it does make it look promising:

wyri,
@wyri@haxim.us avatar

Made scale model pillars with diagonal beams for sturdiness but still some movement possible (had bigger ones with the green bars but didn't take a photo), for connecting the blades to:

wyri,
@wyri@haxim.us avatar
wyri,
@wyri@haxim.us avatar

Had another go at this tonight, and managed to get the RPI and SSD into the cube. Not perfect but getting there 😎

This image is part of an import from Twitter involving about 150 tweets. My apologise for this one not having it. Updates to the thread will have a description on each image.

wyri,
@wyri@haxim.us avatar

It was interesting to see the amount of traffic flowing through the cluster: https://t.co/IqhxR1XEwE

wyri,
@wyri@haxim.us avatar

But sadly had to declare it dead after ETCD decided it ran out of storage: https://t.co/LEmDZxPyGw

wyri,
@wyri@haxim.us avatar

So preparing (with k3d) for a triple master cluster, S3 backup and snapshot and automatic recovery before starting it again. Will then do a series of chaos engineering tests to make sure it's resilient against power outages etc etc.

wyri,
@wyri@haxim.us avatar

The triple master set up will have each master node on a different switch (!!!) (or access point). Mainly because one of the masters will replace my Raspberry Pi 1 that reads information from the smart meter in this house. The other extra master will have a special purpose.

wyri,
@wyri@haxim.us avatar

The meterkast (utility closet) node casing and storage just came in. Never realised M.2 SSD's are so, tiny 😱

wyri,
@wyri@haxim.us avatar

Yay! The fine folks at seem to have caught up with the custom bricks orders after the holidays zerg 🎉. So I can put together the node enclosures in a few days and start the cluster back up.

wyri,
@wyri@haxim.us avatar

Still going for the 3 master setup in the long term, but since I can already start it up with a single master going to focus on building and testing power outage and recovery testing. Hopefully including having a fresh node every time it starts, including the master 🤐

wyri,
@wyri@haxim.us avatar

This is a first for me! put the custom bricks order in a box 😱. Building the node kube's tonight 🎉. Probably powering the cluster back up next week, and hopefully the more space between components should make it require less active cooling

This image is part of an import from Twitter involving about 150 tweets. My apologise for this one not having it. Updates to the thread will have a description on each image.
This image is part of an import from Twitter involving about 150 tweets. My apologise for this one not having it. Updates to the thread will have a description on each image.

wyri,
@wyri@haxim.us avatar

Hah, progress! But seems I'm missing parts, connector pegs to be specifically so I can't do anything at the moment. Good thing I ordered 200 of those the other day. Might need a few more parts, so another order will go out soon 🤣. But really loving the progress here 😍

This image is part of an import from Twitter involving about 150 tweets. My apologise for this one not having it. Updates to the thread will have a description on each image.

wyri,
@wyri@haxim.us avatar

Started prototyping a node with a screen. Next steps are starting X and a browser in kiosk mode

wyri,
@wyri@haxim.us avatar

Another big milestone. A new RPI 4 8GB came in, so I can start assembling the utility closet master node:

wyri,
@wyri@haxim.us avatar

It's been a decade of two since I had to do configuration through jumpers

wyri,
@wyri@haxim.us avatar

There is only one issue. Doing rapid rebuilding of the cluster to test things out is going to be a lot harder because this one won't be as easy to access as the others currently are...

wyri,
@wyri@haxim.us avatar

Got it up and running at least https://t.co/0s4snsqtBh and while I love the casing I still think defaulting to using the USB3 instead of the USB2 port will leave plenty of users with I/O issues.

wyri,
@wyri@haxim.us avatar

All the nodes in my cluster run on SSD's on the v2 port because the v3 port has stability issues. Couldn't even write the image to the M.2 using the v3 port. Hence the extension cables to hook it to the v2 port. So it has a tail:

wyri,
@wyri@haxim.us avatar

But at least I now have two leader nodes, and I'm getting errors like this during some calls:

wyri,
@wyri@haxim.us avatar

Whoop whoop, the parts I ordered 6 weeks ago just came in! Time to finish the current enclosures and start with building the tower to put them in

wyri,
@wyri@haxim.us avatar
wyri,
@wyri@haxim.us avatar

The end result for tonight, it is not perfect, and there should be another set of parts arrive in two weeks, but we're getting there. Going to have to order more baseplates tho, those things are awesome and will make the whole project stable af.

wyri,
@wyri@haxim.us avatar
wyri,
@wyri@haxim.us avatar

the enclosure tower. Because maintenance with everything fixed in place will be messy.

wyri,
@wyri@haxim.us avatar

Alright, the new leader node was put into position earlier today:

wyri,
@wyri@haxim.us avatar

Time to start terraforming this thing!

wyri,
@wyri@haxim.us avatar

TR apply 1 done:

wyri,
@wyri@haxim.us avatar

And that is TF apply 2 done:

wyri,
@wyri@haxim.us avatar

And that is the 3rd TF apply done. The (almost) full cluster back up in 3 commands and about 30 minutes of waiting for everything to be installed and back to running:

wyri,
@wyri@haxim.us avatar

Next up is rook as a potential replacement for longhorn to use NFS/ISCSI for storage instead. Longhorn is really cool, but with my current number of cluster reinstalls I'm looking for something easier to recover data with

wyri,
@wyri@haxim.us avatar

Today I started with all 4 nodes running

wyri,
@wyri@haxim.us avatar

The day ends with only two running because I don't need more, so why would I

blieb,
@blieb@blieb.net avatar

@wyri when will the first rpi5 node join the cluster? 😆

wyri,
@wyri@haxim.us avatar

@blieb Depends on when I put SFP into the switch 😅

wyri,
@wyri@haxim.us avatar

Realised today that I could just stack all 3 into one tower above each other. Like the intended result:

wyri,
@wyri@haxim.us avatar
wyri,
@wyri@haxim.us avatar

Another win for the cluster today is the set up of NFS as PVC storage backend. The SSD's are useful and Longhorn did do what is intended to do. But NFS survives cluster reinstalls. And trying to KISS it here: https://t.co/AwqmBDit7w

wyri,
@wyri@haxim.us avatar

Installed 's Heimdall on the cluster today. A bit sad the app isn't an enhanced app, but I'm pretty sure I can muster up writing some to turn it into an enchanted app 🤔

wyri,
@wyri@haxim.us avatar

Had lots of fun this weekend building a platform above the switch. Starting out with these half perfect towers (the top still needs diagonal support before adding more nodes):

wyri,
@wyri@haxim.us avatar

The switch is placed in the middle on top of the middle blue statch for additional support:

wyri,
@wyri@haxim.us avatar

It looks big, and it is big but this finally means I can start working on a long term cabling plan and have plenty of space underneath the switch to do so and for cooling:

wyri,
@wyri@haxim.us avatar

During the placement (without anything going down FYI), I found a slight miscalculation:

wyri,
@wyri@haxim.us avatar

After fixing that it has been standing fine like this for a day now. Just need to make the space between the top of the switch and the node platform smaller:

wyri,
@wyri@haxim.us avatar

Really happy with the results of this weekend. Swapping all of this when all the nodes where up and running and my wife was playing online games that went through this switch made it "fun" as at some point I had the live running nodes on my lap while trying to get the switch in

wyri,
@wyri@haxim.us avatar

That thing looks like an abstract painting at night 😍

wyri,
@wyri@haxim.us avatar

Had some fun yesterday prototyping an easy way to support maintenance/swap nodes out without having to take the rack apart:

This image is part of an import from Twitter involving about 230 tweets. My apologise for this one not having it. Updates to the thread will have a description on each image.

wyri,
@wyri@haxim.us avatar

Think I found a good way to try and keep the cats off the platform, or at least detect when they get up it:

wyri,
@wyri@haxim.us avatar

Worked on a PoC to make adding and removing node enclosures easier by not directly making them part of the structure. But instead by putting them on a cart you can take out:

wyri, (edited )
@wyri@haxim.us avatar

First "big" success of the cluster: https://twitter.com/WyriHaximus/status/1534893994731352066 (sorry no Toot)

wyri, (edited )
@wyri@haxim.us avatar

Also, it looks like I missed the first birthday of the project (and thread): https://toot-toot.wyrihaxim.us/@wyri/109858081844775179

wyri,
@wyri@haxim.us avatar
wyri,
@wyri@haxim.us avatar
wyri,
@wyri@haxim.us avatar

Made them in other colours as well, only purple is left to be build:

wyri,
@wyri@haxim.us avatar

Came upstairs this morning and found out a certain cat has been on the cluster during the night and made a mess. Now the cool part is that this shows the latest iteration of node enclosure kept the SSD in place for those two nodes:

This image is part of an import from Twitter involving about 230 tweets. My apologise for this one not having it. Updates to the thread will have a description on each image.

wyri,
@wyri@haxim.us avatar

Been doing maintenance on the cables in the home office today, and as such 3/4 of the cluster was down for a few hours today:

wyri,
@wyri@haxim.us avatar

It came back up with the switch in its new raised position after I took the old raise down for redesigning. And you might notice that the cabinet it was on is no longer standing, but is now lying on it's side providing double the space and a lot more height to work with:

wyri,
@wyri@haxim.us avatar

Now I can put all the nodes directly on the cabinet instead of stacking them on unstable towers. There will be a better more epic to it all once I've figured it all out. But for today I'm happy with the progress:

wyri,
@wyri@haxim.us avatar

New bricks came up so some new updates https://twitter.com/WyriHaximus/status/1555962791785385985

wyri,
@wyri@haxim.us avatar
wyri,
@wyri@haxim.us avatar
wyri,
@wyri@haxim.us avatar

Also experimented with a new rear closer for cable management but folder it behind it put a lot of strain on the cable. Might need to do a few varriations:

wyri,
@wyri@haxim.us avatar

Node colour wise the cluster is shaping up

wyri,
@wyri@haxim.us avatar

Another important part of this project is cable management. So I started creating a hole different groups of cables can go into the structure. This was the first attempt

This image is part of an import from Twitter involving about 230 tweets. My apologise for this one not having it. Updates to the thread will have a description on each image.

wyri,
@wyri@haxim.us avatar
wyri,
@wyri@haxim.us avatar

Nice little milestone, serving HTTPS from the cluster internally on the home network only. Without port forwarding 😎:

wyri,
@wyri@haxim.us avatar

Part of getting this to work was to put all my DNS records for that domain in 's nameserver service and let do a DNS01 challenge.

So I made the call to put all my DNS records in making managing it easier and more atomic. This makes migrating

wyri,
@wyri@haxim.us avatar

away in the future also easier. And I can have a set of defaults on all my domains.

wyri,
@wyri@haxim.us avatar

Minor detail, but it colour coded cabels for the nodes:

wyri,
@wyri@haxim.us avatar

Initial prototype to keep an RJ45 patch block in place with just plain and simple standard 😎.

wyri,
@wyri@haxim.us avatar
wyri, (edited )
@wyri@haxim.us avatar

This will make sure I can keep the cool external aesthetics of the coloured cables from the switch, without having to pull that throughout the MOC to the nodes (it just makes maintenance also a lot easier): https://toot-toot.wyrihaxim.us/@wyri/111193914559469567

wyri,
@wyri@haxim.us avatar

Also lowered the switch, these legs are both more solid and require 100 or 200 connector pegs less to keep it up. The previous design had a flaw where the top was turned/twisted a few degrees out of alignment with the bottom due to the tensions in the design.

This image is part of an import from Twitter involving about 230 tweets. My apologise for this one not having it. Updates to the thread will have a description on each image.
This image is part of an import from Twitter involving about 230 tweets. My apologise for this one not having it. Updates to the thread will have a description on each image.

wyri,
@wyri@haxim.us avatar

Also with this new design I can put the fan you see to the right inside to cool the switch off extra when needed. (It should only be needed during heatwaves; plus temperature is monitored through several systems.)

wyri,
@wyri@haxim.us avatar

s/right/left

wyri,
@wyri@haxim.us avatar

New parts came in while I was for work on the other side of the pond. All for the base plate, and cable management at the front:

This image is part of an import from Twitter involving about 230 tweets. My apologise for this one not having it. Updates to the thread will have a description on each image.

wyri,
@wyri@haxim.us avatar

The front cable management looks like this and will divide different groups of cables into different sections:

wyri,
@wyri@haxim.us avatar

The baseplates, all 18, will replace the current 48x48 grey baseplate and making it much more sturdy and expandable in the future:

wyri,
@wyri@haxim.us avatar

It's a bit of work putting them in rows of 3:

wyri,
@wyri@haxim.us avatar

Putting that together gives a nice solid base plate, covering the entire designated space:

wyri,
@wyri@haxim.us avatar

While doing this I learned that 3x6 "BRICK 4/3, 16X16 W/ 4.85 HOLE" (element ID 6302092) matches the size of an KALLAX cabinet (https://t.co/erKDPgpjLT) perfectly:

wyri,
@wyri@haxim.us avatar

It took me a bit of timing, and cable patching, to get all cables through the cable management holes. The cluster never went down as it was shifting workloads between nodes as the cables were patched:

wyri,
@wyri@haxim.us avatar
wyri,
@wyri@haxim.us avatar
wyri,
@wyri@haxim.us avatar

Table of contents for the blog series of this project incoming in a week or two: https://twitter.com/WyriHaximus/status/1580295782783016960

wyri,
@wyri@haxim.us avatar

Ok the ToC is up a bit earlier: https://blog.wyrihaximus.net/2022/10/building-a-kubernetes-homelab-with-raspberry-pies-and-lego-table-of-contents/

Partly because it was quick and easy to write. Onto a post with more insights

wyri,
@wyri@haxim.us avatar
wyri,
@wyri@haxim.us avatar

The parts for the RJ45 plug holders came in today. Here is one of them I build to make sure the parts and idea fully worked. But since we put the parts in a personal advent calendar it's going to take up to christmas before all of them are assembled and on their spot.

This image is part of an import from Twitter involving about 230 tweets. My apologise for this one not having it. Updates to the thread will have a description on each image.

wyri,
@wyri@haxim.us avatar
wyri,
@wyri@haxim.us avatar

3/6 RJ45 holders pick up from the advent calendar, with 3 to go and 4 days left it should be straight forward getting the last 3 as well before I can start hooking them up at the rear of this MOC

wyri,
@wyri@haxim.us avatar

Meet , the new node that will trial https://www.talos.dev/

wyri,
@wyri@haxim.us avatar

and the to adapter are coming later this week and early next week. If all goes well with https://www.talos.dev/ it will replace k3os. This is also the first time in 9 months I got my hands on a 4 8GB, thanks to

wyri,
@wyri@haxim.us avatar

Put the blocks in place, it's going to make managing cable a lot easier now. In the end, each node will have matching cables because both sides of the row are on the other side cables 🤣.

This image is part of an import from Twitter involving about 230 tweets. My apologise for this one not having it. Updates to the thread will have a description on each image.

wyri,
@wyri@haxim.us avatar

Cables are still somewhat messy but this is a good step in the right direction. Already have some ideas for the next steps, including a nice exit hole for the power cable

This image is part of an import from Twitter involving about 230 tweets. My apologise for this one not having it. Updates to the thread will have a description on each image.

wyri,
@wyri@haxim.us avatar

Had another experiment yesterday to be able to slide the node enclosure in and out of the MOC. The bottom is pretty solid. But the higher I got the worse it got. Sliding in and out is great, needs some tweaks but it will work. The network and

This image is part of an import from Twitter involving about 230 tweets. My apologise for this one not having it. Updates to the thread will have a description on each image.

wyri,
@wyri@haxim.us avatar

cables I wanted to plug in behind it. But that would mean getting deeper and under the roof (the "roof" shown in the photos for sure won't make it into the final version). So considering making plugs in the side to plug the node into MOC.

wyri,
@wyri@haxim.us avatar

Installed ' on (upper right) as a possible replacement for /.

wyri,
@wyri@haxim.us avatar

So far so good! Going to do some firewall tests to lock down the cluster vlan to only what is needed over the weekend.

wyri,
@wyri@haxim.us avatar

One of the things that is always important with computers is cooling them. So as such, I have a fan from to cool the nodes. And right now it is always on, but I already connected a plug to be able to only turn it on when the nodes get hot enough that they need

wyri,
@wyri@haxim.us avatar

some help cooling down. The automation isn't in place yet, but I already started preparing for it :).

wyri,
@wyri@haxim.us avatar

If the experiments with https://www.talos.dev/ are positive, it will replace k3os. One of big plusses is that this is #k8s and not #k3s so I should be able to use the clusterautoscaler for autoscaling. Which needs some code to work with #unifi switches.

wyri,
@wyri@haxim.us avatar

This thread is now fully imported and can be found:

Regular updates will now resume 🎉

wyri,
@wyri@haxim.us avatar

One of the biggest things is that since half a year ago the cluster is now fully on https://www.talos.dev/ and managed through . It's been a use success and a quick and easy move. (Well it took me a day tho.) It is also a lot easier to rebuild nodes:

  • Write raw image to disk
  • Apply TerraForm
wyri,
@wyri@haxim.us avatar

The second biggest thing is that, since a couple of days ago, the cluster has not 1 but 3 control plane nodes. And for some reason hours before I started working on adding the other two the only one running then crapped out: https://toot-toot.wyrihaxim.us/@wyri/111165411454560918

wyri,
@wyri@haxim.us avatar

Meet Deathwing, one of our , he has a medical history and sometimes likes to retreat. So he started claiming the spot next to the switch tucked away in that corner. So we decided to turn it into a catbed for him. Including the roof, which needs to be added again in the current iteration but the places where needed to extend the cluster surface area.

Deathwing in his first generation cat bed on a green blanket. With the nodes stacked to the right of him outside the wall and roof of his corner.
Second generation which was wider but without the roof. He is leaning on the wall pushed against the nodes with his tail hanging off the now fluffier blanket.
Third, and current genration with only one wall and a much bigger sleeping area. Nodes have moved to make space for that. Deathwing happily sleeping (and snoring).

wyri,
@wyri@haxim.us avatar

Here is a reason why I love ; just made replicas run in different zones. Downstairs where 2 of the control plane nodes are is a zone, my desk for the other control plane node is a zone, and the rest of the cluster behind me is a zone. So I now forced a broker in each zone, but also told it not to get on the same node as .

This will get me some more usage out of those nodes. Because honestly 3 nodes as control plane for a home cluster? That is overkill.

wyri,
@wyri@haxim.us avatar

Today the long enough cables for the SFP modules came in (the other cames were 4cm shorter than advertised and about 3cm too short). So hooked it up. This now means there is a patch location to the side of the MOC. So now I have the last two ports I need for the last two nodes:

Side patch plugs with cables from the downstairs switch in them
The patch module on my desk with the two short cables sticking out at the rear.
Showing off the glow in the dark effect of the arches facing stairs.

wyri,
@wyri@haxim.us avatar

While yanking on some cables I took out the support tower for the cat bed on top. So no cat bed there for Deathwing tonight. Will do the cables better tomorrow and rebuild it. Might have to take part of the foundation loose, going to be fun.

wyri,
@wyri@haxim.us avatar

Now that all colour matching cables are hooked up, it looks absolutely fabulous! Still needs a lot of work, but it is getting there.

6 lego node enclosures in the colours of the rainbow

wyri,
@wyri@haxim.us avatar

Next up getting each node it's own fan instead of using those that @erikaheidi showed cased a year ago on nodes to cool them down. They need some cleaning once in a while tho.

wyri,
@wyri@haxim.us avatar
wyri,
@wyri@haxim.us avatar

Today was to rebuild that supporting pillar and restore Deathwing's cat bed in time for tomorrow.

Started with ensuring adding a 16x16 in the middle of the pillar for support, connecting it to the plates it supports.

Baroness Draka came by to inspect it and then put it back up after rearranging some cables that pulled it over in the first place.

All 5 pillars put back together and checked for matching length.
Baroness Draka voicing her opinions about my changes with the supporting 16x16 plate in the middle.
Pillar put back in place and this time connected to the baseplate it suports.

wyri,
@wyri@haxim.us avatar

Put in air flow directors for some extra cooling for the switch in place as well. It's currently running at a fine 57 degrees (inside the outside is always touchable but can feel hot), but can go up to 70 especially when it is hot up here.

Monitoring everything, currently building something that queries and tosses it over to @homeassistant so can turn on the cooling fan, and in the worse case scenario cut the power.

A look at the cables under the switch, and it's a wee bit of a twisted rainbow "stream" of data.
The airflow directors, it's not a lot, especially the yellow one but it should help put some cold air against the switch.
A look at the air flow directors with the switch back in position.

wyri,
@wyri@haxim.us avatar

The first attempt to put a roof on: failed.

The roof was technically on, and while it's not intended for the to walk on, they will. Any light force would bend the walls. So going to reinforce those first and come up with a solid skeleton and then give it another go.

image/png

wyri,
@wyri@haxim.us avatar
wyri,
@wyri@haxim.us avatar

Finished building most of the blade in stud.io, only thing left is the front/rear node suspension (the side with all the USB and Network sockets) and be happy with a final design.

Got several designs in the cluster, from sloppy to tight but not happy with any of them honestly.

image/png

wyri,
@wyri@haxim.us avatar

Dust sucks, but have to keep it in mind. This is the controlplane node on my desk, didn't clean it on purpose for a few months to see how bad it would be.

Took less than a minute to clean, and the dust barely affected the cooling. But it was still a lot more than expected.

image/png
image/png
image/png

wyri,
@wyri@haxim.us avatar

Yesterday was a fun day. New cooling fans came in to replace the desk fans I've been using so far.

Set them up in such a way they pull the air through the node instead of pushing to force less dust on the nodes.

Same boxes but lined up and folded open showing the fans in them.
A green lego node enclosure with node and the fan installed from the front
A blue node enclosure with the same parts installed showing the cable work at the rear

wyri,
@wyri@haxim.us avatar

Now with these on each node dropped the temperature per node by about 10 - 15 degrees. The reason there is a slight increase on the graph is because my wife and I started playing WoW. So two gaming PC's and two humans affected the temperature in the room by that much.

The temperature graph shows it is hovering between 45 degrees Celcius and 50 degrees Celcius on about 47 degrees Celcius. Right after the fan gets installed it drops to about 35 degrees Celcius to then go hover between 30 degrees Celcius and 35 degrees Celcius.

wyri,
@wyri@haxim.us avatar

Deathwing helped me out yesterday by inspecting them: https://toot-toot.wyrihaxim.us/@wyri/111512846029202330

wyri,
@wyri@haxim.us avatar

Started building the full cluster in its current state in stud.io. Almost 1100 parts for just the bare basics. Adding the cable management and patch racks next. This thing is going to be massive.

The cluster in real life

wyri,
@wyri@haxim.us avatar

Parts for 3 projects came in today:

  • yellow: new node (mind)
  • red: screen on my desk showing important information (to be determined)
  • blue: cable management (more on that later)
wyri,
@wyri@haxim.us avatar

Those cooling fans do their work pretty well. Recorded the first below 30 degrees today 😱. To be fair, it has been freezing since yesterday 😅

wyri,
@wyri@haxim.us avatar

Learned a few things about how runs as a cluster on the cluster and how it affects performance: https://toot-toot.wyrihaxim.us/@wyri/111766029001237630

wyri,
@wyri@haxim.us avatar

Did some patch panel maintenance aligning it with improvements in stud.io I made yesterday. Also started working on and closing the cable gutter. Not per se the most boring part. The gutter behind the nodes will require patch spots for each node for and :

image/png
image/png
image/png

wyri,
@wyri@haxim.us avatar

Thanks to my amazing wife, one of the cluster control plane nodes now has a skeleton wizard guarding it

wyri,
@wyri@haxim.us avatar

Achievement unlocked: Write a #kubernetes node autoscaler in an afternoon that scales up and down with demand. Completely hacked together in #php to my particular requirements. (Absolutely not built for the long term.) So now I have two #k8s nodes behind me that are off 💙.

Autoscaler logs scaling up one node by turning the power of that node on and waiting for it to be ready
A quick shot overview of my cluster with two of the nodes off.

wyri,
@wyri@haxim.us avatar

One of them just turned on, WTF? Ow there was a long pending pod that had no where to go due to anti affinity 😅.

wyri,
@wyri@haxim.us avatar

With some tweaking to the autoscaling, it's pretty clear it only sometimes scales up when needed. Got a bit batch of World of #Warcraft datamining scheduling for tomorrow, so will see how that impacts node usage. Overall I'm pleased with the makeshift node auto scaling. #kubernetes #k8s

wyri,
@wyri@haxim.us avatar

Full list of cluster nodes + NAS for storage. Take a guess which are the controlplane:

wyri,
@wyri@haxim.us avatar

Did another node cleaning, first time with the fans on all of them. They work great cooling them, but also at getting dust of then. Judging by where the dust settles it was a good call to pull the air through instead of blow it in the nodes. But nothing a (paint)brush can't remove. All nodes where as clean as if they would be new when done.

image/png
image/png
image/png

wyri,
@wyri@haxim.us avatar

To help combat the dust, and as planned I've started working on an enclosure for the enclosures. Just a first stab at it, that worked out to have some stability issues to address.

wyri,
@wyri@haxim.us avatar

Aside from roofing up, cable management is the next big topic. Need to take care of those cables so created a few more patch boxes for each node so it can just patch directly into the cable gutter:

image/png
image/png

wyri,
@wyri@haxim.us avatar

One of the reasons I haven't published a blog post yet on the nodes, even though they are pretty much done. Is because I didn't finish the design yet. Well, earlier today I finished the design. Reinforced it in some locations, and included options to stack them.

The stacking will have to be designed. But finalizing the design is already a big milestone. Designs will be available (for sale) on Bricklink and/or ReBrickable at some point.

wyri,
@wyri@haxim.us avatar

Been updating the node enclosures to this design and still very happy with it. Will design the stacking build soon and then do a blog post on the whole thing. Stacking is going to be fun with the whole big cooling fan in front. Which I'll make optional in the design so people can opt out of it. One major thing I want to do, before blogging about or and putting the design up on Bricklink/ReBrickable is to get some photo's with a camera to support the reasoning for this design.

wyri,
@wyri@haxim.us avatar

One of the things I want to have once all nodes are in a "rack" so to say is to have some lights to indicate which node is on. Got a sonof mini wifi for each node so I can turn off the lights when no one is here. For the same reason I have big cooling fans: Our sleep at night. Light pollution is an issue with this many LED's around. So with the sonof mini wifi, and some node-red utilizing control through home assistant, with data from the everything presence 1. It will only turn on those lights

wyri,
@wyri@haxim.us avatar

when there is someone near it. And when my gaming PC is on. Which effectively means it lags about a minute.

wyri,
@wyri@haxim.us avatar

Started working on the cable gutter for all the #networking and #light (#USB) cables. Started with the patch panel on the left (home office) side.

Outside view of cables coming out heading towards the home office switch.

wyri,
@wyri@haxim.us avatar

After that, I did the node patch boxes and hooked each node into it. Tomorrow I'm putting a wall around the gutter to keep everything in one place. And after that figure out how to create a patch box for the USB lighting cables.

image/png

wyri,
@wyri@haxim.us avatar

This is my current power supply situation for the cluster worker nodes, , and the switch supplemental cooling fan. Not ideal, but perfectly fine if it wasn't in the way of the MOC I've been building around the cluster. So it has to move:

wyri,
@wyri@haxim.us avatar

The initial thought was to use this one and put it under the MOC. Except it has 3 ports .... which is problematic because of the bulkyness of one adapter:

wyri,
@wyri@haxim.us avatar

So instead I went with this tower, we have the same one downstairs powering the living room display. Plenty of space and nothing will block each other. Plus it comes with 4 powers we will utilize for (more) lego display lights and planned sensors on the stairs to the home office. Just need to pick and time and date as doing this will take that entire switch down, and create a network split between two and one control plane nodes.

Power tower on the floor under the moc powered and ready to be switched to

wyri,
@wyri@haxim.us avatar

The fun part is that I've already accidentally ran with 3 nodes, while I need 4 over the past week occasionally. Tweaked my home brow cluster autoscaler a bit to aggressive and it took out a node while it shouldn't.

wyri,
@wyri@haxim.us avatar

Thing is that I used to have a fixed always on + node. But changed it so that all of them can be turn on and off. And because it will always turn the longest running off, it's now cycling through all the nodes over time and each one of them gets run down time.

  • All
  • Subscribed
  • Moderated
  • Favorites
  • Ubiquiti
  • DreamBathrooms
  • magazineikmin
  • modclub
  • Durango
  • Youngstown
  • rosin
  • khanakhh
  • slotface
  • ngwrru68w68
  • mdbf
  • thenastyranch
  • kavyap
  • InstantRegret
  • tester
  • JUstTest
  • everett
  • normalnudes
  • GTA5RPClips
  • osvaldo12
  • ethstaker
  • cisconetworking
  • tacticalgear
  • anitta
  • provamag3
  • cubers
  • Leos
  • megavids
  • lostlight
  • All magazines