Not🐧A🐧Convicted🐧Felon<p><span class="h-card" translate="no"><a href="https://mastodon.functional.computer/@samir" class="u-url mention" rel="nofollow noopener" target="_blank">@<span>samir</span></a></span> Every single day a team of 25 people is kept busy running a <a href="https://hachyderm.io/tags/K8s" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>K8s</span></a> cluster with 1200 nodes, that actually could be replaced by less than ten 1U machines using a system design that actually solves the 10K problem, instead of one that struggles to handle even 10 req/s.<br>This is the vicious cycle of technical debt. </p><p>This week's problem: cluster-autoscaler has a bug that causes machines that start up to get stuck in a zombie state without successfully registering with the control plane. This causes all kinds of cluster scale up issues, especially with multi-AZ workloads.</p><p>Every week is a new bug, a new edge case, a new issue with dependencies (K8s, helm, Rancher, Istio, etcd, ...) a new issue with AWS, it just goes on and on.</p><p>I yearn for the days of simplicity of just running servers in racks and you're like "oh, had another hard drive failure in rack 04, have to go swap out a HDD cartridge and rebuild the RAID".</p>