Kubernetes: Networking with the UDM
While dicking around with things earlier today, I noticed that a couple of services were reporting an unusual source IP address for various requests. I was lazy and didn’t set up the X-forwarded-for
stuff, so I wasn’t expecting a useful IP, but more concerning was it looked like everything was coming from the router.
After misunderstanding the issue and getting sidetracked checking how to configure MetalLB to use externalTrafficPolicy: Local
(it was already configured so), I noticed that it was actually coming from the router instead and set about checking that.
There were two problems here, both Unifi’s fault (or possibly mine for the dodgy way things are configured, but it was definitely a violation of POLA). First of all, when you configure a port-forward in a Unifi UDM, even though you can pick “WAN”, “WAN2”, or “Both” - it applies to all goddamn interfaces including the LAN and all VLANs, VPNs, etc. I have no idea why this is, but the end result meant that my internal traffic was being handled by iptables instead of just forwarded onwards like it should have been.
Second, and perhaps more worrisome… if you specify a target for the port forward that Unifi doesn’t know about, it turns on “IP Masquerade” (ie, NAT). Even though the underlying Linux knows how to reach it because I run FRR, Unifi knows best and masquerade it is. I confirmed this behaviour by adding a port forward to a machine that Unifi does know about, and there’s no ipmasq. I’m guessing this happens because Unifi assumes that the network is off in WAN-land, and isn’t aware that FRR is actually sending it back out on the LAN via an injected route. I tried various attempts at resolving this by adding a new network on the LAN, but nothing worked.
In the end, since I’m using the on_boot
scripts, I just yanked the port forward, added an “allow” rule to the firewall, and made a boot script to inject a port forward rule for me. Upon a cursory check it doesn’t look like updates to the firewall clobber it, but worst case scenario it’ll come up on a reboot and I can manually trigger it.
Anyway, in the process of this, I’d originally tried adding a static route - since all traffic goes through Traefik, and Traefik is pinned to a specific host that runs the most bandwidth-intensive services for right now, I thought this was a liveable compromise. It didn’t fix the issue, so I unwound it and let FRR handle it again anyway, but it got me thinking more about this, in particular that I could run multiple replicas of Traefik on various nodes, then do ECMP to balance traffic amongst them.
MetalLB famously doesn’t support OSPF yet (and may never), however one suggestion I saw that seems trivially doable for me is to put FRR on each of the nodes, have MetalLB’s speaker speak to it via BGP, then have it speak to the UDM-SE via OSPF.
Alas, this is not to be, because the UDM-SE doesn’t have ECMP enabled:
root@Dreamy-Boi:~# zgrep CONFIG_IP_ROUTE_MULTIPATH /proc/config.gz
# CONFIG_IP_ROUTE_MULTIPATH is not set
Though the idea may still be reasonable for another reason: if each of the nodes has it’s own FRR daemon, then it’ll have immediate routes to the load balancer IPs. That means if for some reason a service connects out to the load balancer IP, instead of it going all the way out to the router and bouncing back, it’ll just hop right over. Do I have any services configured this way? I don’t believe so - the only culprit that comes to mind is Tailscale in an LXD container, and it currently works but is used so infrequently that frankly who gives a shit.
Anyway, back to the issue at hand - I’m glad I solved this, because it turns out in the roughly two months since we switched over to the UDM, exactly none of my Traefik middlewares that only allow LAN access to things like HomeAssistant worked, because the router was NATing, everything looked like LAN access. So if someone forged DNS entries for ha.fwaggle.org
and my home IP, they’d have access to the service (still need auth, but still). That’s terribly annoying.
Secondly, now that I have real working X-forwarded-for
on all services, I think this may have warded off one or two rate-limiting issues with our home-hosted Mastodon service.