Homeprod Upgrades: Part 1 - Renumbering the server subnet

Today, I started work a couple hours early to catch an early planning meeting, which meant I would finish early as well. Since I had a couple of hours to kill (and potentially kill our internet connection) until Duncan came home, and Sabriena was reading a book and not using it, I decided what the hell, let’s kick this show off early.

In preparation for the weekend, on Thursday morning before work I’d already set the DHCP lease time to 300 seconds (5 minutes, if my maths is correct), so there was no reason not to. While I was at it, after reading a thread on a forum someplace, I turned off DHCP guarding as well.

Starting out by preparing for the worst-case scenario, which was a reset of everything - I wrote down all the port configs for each switch. I also noted down all the static IPs for each server, since I figured it would reset them if I changed the subnet of the network they were on. Finally, to ensure there was no issues with SQLite taking a shit because something evaporated underneath it, I shut down a bunch of services on the Kubernetes cluster by just deleting all the deployments.

I then unplugged the switch cable that went to the server rack, so they could talk amongst themselves while I broke everything, and then I hit the button to save the changes with the new subnet on the default VLAN and… nothing. It won’t save, it gave me an error message:

Failed saving network “Default”. {modelType, select, profile {Profile} network {Network} portIpGroup {Port and IP Group} other {}} includes {type, select, User {a Client’s } FirewallRule {a Firewall Rule } other { }}"{name}" configuration. Please remove this first before deleting the {modelType, select, profile {Profile} network {Network} portIpGroup {Port and IP Group} other {}}.

This took a while to figure out, as I’d already removed all the firewall rules that referred to it (and no, the above is not me filling it in, that’s verbatim what the error message said)… I also had to remove the two static routes that pointed to one of the servers in that network, and then I was able to save it.

I then threw up the new network on a VLAN, and added the original subnet back to it, which meant that Unifi would be responding on the old “inform” host as well, and well… everything just worked. So I set about putting the firewall rules back, re-added the static routes for the old LXD server and the new (unused, as of right now) Incus server, plugged the rack switch back in, and then after a few minutes I re-applied all the deployments on the cluster and everything “just worked”.

Well, almost everything, I managed to leave out (trying to simplify) a rule which allowed the server cluster to speak to the IoT subnet, so the lights etc did not connect… but once I put that back, everything came back as well (didn’t even have to restart Home Assistant).

I’m really well impressed, it almost went too smooth. Hopefully that’s an indication (rather than the calm before the storm) of how this weekend’s maintenance will go!

Author:

fwaggle

Published:

2024-06-14T18:13:00+1000

Modified:

2024-07-07T10:40:26+1000

Filed under:

Location:

Horsham, VIC, Australia

Navigation: Older Entry Newer Entry