I don't get the hate for Kubernetes in this thread. TFA is from Figma. You can talk all day long about how early startups just don't need the kind of management benefits that Kubernetes offers, but the article isn't written by someone working for a startup, it's written by a company that nearly got sold to Adobe for $20 billion.
Y'all really don't think a company like Figma stands to benefit from the flexibility that Kubernetes offers?
k8s is complex, if you don't need the following you probably shouldn't use it:
* Service discovery
* Auto bin packing
* Load Balancing
* Automated rollouts and rollbacks
* Horizonal scaling
* Probably more I forgot about
You also have secret and config management built in. If you use k8s you also have the added benefit of making it easier to move your workloads between clouds and bare metal. As long as you have a k8s cluster you can mostly move your app there.
Problem is most companies I've worked at in the past 10 years needed multiple of the features above, and they decided to roll their own solution with Ansible/Chef, Terraform, ASGs, Packer, custom scripts, custom apps, etc. The solutions have always been worse than what k8s provides, and it's a bespoke tool that you can't hire for.
For what k8s provides, it isn't complex, and it's all documented very well, AND it's extensible so you can build your own apps on top of it.
I think there are more SWE on HN than Infra/Platform/Devops/buzzword engineers. As a result there are a lot of people who don't have a lot of experience managing infra and think that spinning up their docker container on a VM is the same as putting an app in k8s. That's my opinion on why k8s gets so much hate on HN.
There are other out of the box features that are useful:
* Cert manager.
* External-dns.
* Monitoring stack (e.g. Grafana/Prometheus.)
* Overlay network.
* Integration with deployment tooling like ArgoCD or Spinnaker.
* Relatively easy to deploy anything that comes with a helm chart (your database or search engine or whatnot).
* Persistent volume/storage management.
* High availability.
It's also about using containers which mean there's a lot less to manage in hosts.
I'm a fan of k8s. There's a learning curve but there's a huge ecosystem and I also find the docs to be good.
But if you don't need any of it - don't use it! It is targeting a certain scale and beyond.
I started with kubernetes and have never looked back. Being able to bring up a network copy, deploy a clustered database, deploy a distributed fs all in 10 minutes (including the install of k3s or k8s) has been a game-changer for me.
You can run monolithic apps with no downtime restarts quite easily with k8s using rollout restart policy which is very useful when applications take minutes to start.
In the same vein here.
Every time I see one of these posts and the ensuing comments I always get a little bit of inverse imposter syndrome. All of these people saying "Unless you're at 10k users+ scale you don't need k8s". If you're running a personal project with a single-digit user count, then sure, but only purely out of a cost-to-performance metric would I say k8s is unreasonable. Any scale larger, however, and I struggle to reconcile this position with the reality that anything with a consistent user base should have zero-downtime deployments, load balancing, etc. Maybe I'm just incredibly OOTL, but when did these simple features to implement and essentially free from a cost standpoint become optional? Perhaps I'm just misunderstanding the argument, and the argument is that you should use a Fly or Vercel-esque platform that provides some of these benefits without needing to configure k8s. Still, the problem with this mindset is that vendor lock-in is a lot harder to correct once a platform is in production and being used consistently without prolonged downtime.
Personally, I would do early builds with Fly and once I saw a consistent userbase I'd switch to k8s for scale, but this is purely due to the cost of a minimal k8s instance (especially on GKE or EKS). This, in essence, allows scaling from ~0 to ~1M+ with the only bottleneck being DB scaling (if you're using a single DB like CloudSQL).
Still, I wish I could reconcile my personal disconnect with the majority of people here who regard k8s as overly complicated and unnecessary. Are there really that many shops out there who consider the advantages of k8s above them or are they just achieving the same result in a different manner?
One could certainly learn enough k8s in a weekend to deploy a simple cluster. Now I'm not recommending this for someone's company's production instance, due to the foot guns if improperly configured, but the argument of k8s being too complicated to learn seems unfounded.
/rant
With the simplicity and cost of k3s and alternatives it can also make sense for personal projects from day one.
I've been in your shoes for quite a long time. By now I've accepted that a lot of folks on HN and other similar forums simply don't know / care about the issue that Kubernetes resolves, or that someone else in their company takes care of those for them
It’s actually much simpler than that
k8s makes it easier to build over engineered architectures for applications that don’t need that level of complexity
So while you are correct that it is not actually that difficult to learn and implement K8S it’s also almost always completely unnecessary even at the largest scale
given that you can do the largest scale stuff without it and you should do most small scale stuff without it, the number of people for whom all of the risks and costs balancr out is much smaller than the amount that it has been promoted and pushed
And given the fact that orchestration layers are a critical part of infrastructure, handing over or changing the data environment relationship in a multilayer computing environment to such an extent is a non-trivial one-way door
100%
I can bring up a service, connect it to a postgres/redis/minio instance, and do almost anything locally that I can do in the cloud. It's a massive help for iterating.
There is a learning curve, but you learn it and you can do so damn much so damn easily.
+1 on the learning curve, took me 3 attempts (gave up twice) before I spent 1 day learning the docs, then wasting a week moving some of my personal things to it.
Now I have a small personal cluster with machines and vps's (on some regions I don't have enough deployments to justify an entire machine) with a distributed multi-site fs that's mostly as certified for workloads as any other cloud. CDN, GeoDNS, nameservers all handled within the cluster. Any machine can go offline while connectivity remains the same, minus the timeout requirement of 5 minutes of downed pods to be rescheduled for monolithic services.
Kubernetes also provides an amazing way to learn things like bgp, ipam and many other things via calico, metallb and whatever else you want to learn.
To this I would also add the ability to manage all of your infrastructure with k8s manifests (eg.: crossplane).
For anyone who thinks this is a laundry list - running two instances of your app with a database means you need almost all of the above.
The _minute_ you start running containers in the cloud you need to think of "what happens if it goes down/how do I update it/how does it find the database", and you need an orchestrator of some sort, IMO. A managed service (I prefer ECS personally as it's just stupidly simple) is the way to go here.
Eh, you can easily deploy containers to EC2/GCE and have an autoscaling group/MIG with healthchecks. That's what I'd be doing for a first pass or if I had a monolith (a lot of business is still deploying a big ball of PHP). K8s really comes into its own once you're running lots of heterogeneous stuff all built by different teams. Software reflects organizational structure so if you don't have a centralized infra team you likely don't want container orchestration since it's basically your own cloud.
Sure you can use an AWS ASG, but I assume you also tie that into an AWS AlB/NLB. Then you use ACM for certs and now you are locked in to AWS times 3.
Instead you can do those 3 and more in k8s and it would be the same manifests regardless which k8s cluster you deploy to, EKS, AKS, GKE, on prem, etc.
Plus you don't get service discovery across VMs, you don't get a CSI so good luck if your app is stateful. How do you handle secrets, configs? How do you deploy everything, Ansible, Chef? The list goes on and on.
If your app is simple sure, I haven't seen simple app in years.
I've never worked anywhere that has benefitted from avoiding lock-in. We would have saved thousands in dev-hours if we just used an ALB instead of tweaking nginx and/or caddy.
Also, if you can't convert an ALB into an Azure Load balancer, then you probably have no business doing any sort of software development.
I don't disagree about avoiding lock-in, and I'm sure it was hyperbole, but if you really spent thousands of dev-hours (approx 1 year) on tweaking nginx, you needed different devs ;)
ALB costs get very steep very quickly too, but you're right - start with ALB and then migrate to nginx when costs get too high
By containers on EC2 you mean installing docker on AMI's? How do you deploy them?
I really do think Google Cloud Run/Azure Container Apps (and then in AWS-land ECS-on-fargate) is the right solution _especially_ in that case - you just shove a container on and tell it the resources you need and you're done.
From https://stackoverflow.com/questions/24418815/how-do-i-instal... , here's an example that you can just paste into your load balancing LaunchConfig and never have to log into an instance at all (just add your own runcmd: section -- and, hey, it's even YAML like everyone loves)
I did this. It’s not easier than k8s, GKE, EKS, etc…. It’s harder cause you have to roll it yourself.
If you do this just use GKE autopilot. It’s cheaper and done for you.
Those all seem important to even moderately sized products.
As long as your requirements are simple the config doesn't need to be complex either. Not much more than docker-compose.
But once you start using k8s you probably tend to scope creep and find a lot of shiny things to add to your set up.
Some ways to tell if someone is a great developer are easy. JetBrains IDE? Ample storage space? Solving problems with the CLI? Consistently formatted code using the language's packaging ecosystem? No comments that look like this:
Some ways to tell if someone is a great developer is hard. You can't tell if someone is a brilliant shipper of features, choosing exactly the right concerns to worry about at the moment, like doing more website authoring and less devops, with a grand plan for how to make everything cohere later; or, if the guy just doesn't know what the fuck he is doing.Kubernetes adoption is one of those, hard ones. It isn't a strong, bright signal like using PEP 8 and having a `pyproject.toml` with dependencies declared. So it may be obvious to you, "People adopt Kubernetes over ad-hoc decoupled solutions like Terraform because it has, in a Darwinian way, found the smallest set of easily surmountable concerns that should apply to most good applications." But most people just see, "Ahh! Why can't I just write the method bodies for Python function signatures someone else wrote for me, just like they did in CS50!!!"
If you don't need any of those things then your use of k8s just becomes simpler.
I find k8s an extremely nice platform to deploy simple things in that don't need any of the advanced features. All you do is package your programs as containers and write a minimal manifest and there you go. You need to learn a few new things, but the things you do not have to worry about that is a really great return.
Nomad is a good contender in that space but I think HashiCorp is letting it slowly become EOL and there are bascially zero Nomad-As-A-Service providers.
If you don't need any of those things, going for a "serverless" option like fargate or whatever other cloud equivalents exist is a far better value prop. Then you never have to worry about k8s support or upgrades (of course, ECS/fargate is shit in its own ways, in particular the deployments being tied to new task definitions...).
I use it (specifically, the canned k3s distro) for running a handful of single-instance things like for example plex on my utility server.
Containers are a very nice UX for isolating apps from the host system, and k8s is a very nice UX for running things made out of containers. Sure it's designed for complex distributed apps with lots of separate pieces, but it still handles the degenerate case (single instance of a single container) just fine.
It's worth bearing in mind that, although any of these can be accomplished with any number of other products as you point out, LB and Horizontal Scaling, in particular, have been solved problems for more than 25 years (or longer depending on how you count)
For example, even servers (aka instances/vms/vps) with load balancers (aka fabric/mesh/istio/traefik/caddy/nginx/ha proxy/ATS/ALB/ELB/oh just shoot me) in front existed for apps that are LARGER than can fit on a single server (virtually the definition of horizontally scalable). These apps are typically monoliths or perhaps app tiers that have fallen out of style (like the traditional n-tier architecture of app server-cache-database, swap out whatever layers you like).
However, K8s is actually more about microservices. Each microservice can act like a tiny app on its own, but they are often inter-dependent and, especially at the beginning, it's often seen as not cost-effective to dedicate their own servers to them (along with the associate load balancing, redundant and cross-AZ, etc). And you might not even know what the scaling pain points for an app is, so this gives you a way to easily scale up without dedicating slightly expensive instances or support staff to running each cluster; your scale point is on the entire k8s cluster itself.
Even though that is ALL true, it's also true that k8s' sweet spot is actually pretty narrow, and many apps and teams probably won't benefit from it that much (or not at all and it actually ends up being a net negative, and that's not even talking about the much lower security isolation between containers compared to instances; yes, of course, k8s can schedule/orchestrate VMs as well, but no one really does that, unfortunately.)
But, it's always good resume fodder, and it's about the closest thing to a standard in the industry right now, since everyone has convinced themselves that the standard multi-AZ configuration of 2014 is just too expensive or complex to run compared to k8s, or something like that.
I had a different experience. Some years ago I wanted to set up a toy K8s cluster over an IPv6-only network. It was a total mess - documentation did not cover this case (at least I have not found it back then) and there was a lot of code to dig through to learn that it was not really supported back then as some code was hardcoded with AF_INET assumptions (I think it's all fixed nowadays). And maybe it's just me, but I really had much easier time navigating Linux kernel source than digging through K8s and CNI codebases.
This, together with a few very trivial crashes of "normal" non-toy clusters that I've seen (like two nodes suddenly failing to talk to each other, typically for simple textbook reasons like conntrack issues), resulted in an opinion "if something about this breaks, I have very limited ideas what to do, and it's a huge behemoth to learn". So I believe that simple things beat complex contraptions (assuming a simple system can do all you want it to do, of course!) in the long run because of the maintenance costs. Yeah, deploying K8s and running payloads is easy. Long-term maintenance - I'm not convinced that it can be easy, for a system of that scale.
I mean, I try to steer away from K8s until I find a use case for it, but I've heard that when K8s fails, a lot of people just tend to deploy a replacement and migrate all payloads there, because it's easier to do so than troubleshoot. (Could be just my bubble, of course.)
Kubernetes isn't even that complicated, and first party support from cloud providers often means you're doing something in K8s inleu of doing it in a cloud specific way (like ingress vs cloud specific load balancer setups).
At a certain scale, K8s is the simple option.
I think much of the hate on HN comes from the "ruby on rails is all you need" crowd.
I guess the ones who quietly ship dozens of rails apps on k8s are too busy getting shit done to stop and share their boring opinions about pragmatically choosing the right tool for the job :)
"But you can run your rails app on a single host with embedded SQLite, K8s is unnecessary."
Always said by people who haven't spent much time in the cloud.
Because single hosts will always go down. Just a question of when.
I love k8s, but bringing back up a single app that crashed is a very different problem from "our k8s is down" - because if you think your k8s won't go down, you're in for a surprise.
You can view a single k8s also as a single host, which will go down at some point (e.g. a botched upgrade, cloud network partition, or something similar). While much less frequent, also much more difficult to get out of.
Of course, if you have a multi-cloud setup with automatic (and periodically tested!) app migration across clouds, well then... Perhaps that's the answer nowadays.. :)
Kubernetes is a remarkably reliable piece of software. I've administered (large X) number of clusters that often had several years of cluster lifetime, each, everything being upgraded through the relatively frequent Kubernetes release lifecycle. We definitely needed some maintenance windows sometimes, but well, no, Kubernetes didn't unexpectedly crash on us. Maybe I just got lucky, who knows. The closest we ever got was the underlying etcd cluster having heartbeat timeouts due to insufficient hardware, and etcd healed itself when the nodes were reprovisioned.
There's definitely a whole lotta stuff in the Kubernetes ecosystem that isn't nearly as reliable, but that has to be differentiated from Kubernetes itself (and the internal etcd dependency).
The managed Kubernetes services solve the whole "botched upgrade" concern. etcd is designed to tolerate cloud network partitions and recover.
Comparing this to sudden hardware loss on a single-VM app is, quite frankly, insane.
Even if your entire control plane disappears your nodes will keep running and likely for enough time to build an entirely new cluster to flip over to.
I don’t get it either. It’s not hard at all.
Your nodes & containers keep running, but is your networking up when your control plane is down?
If you start using more esoteric features the reliability of k8s goes down. Guess what happens when you enable the in place vertical pod scaling feature gate?
It restarts every single container in the cluster at the same time: https://github.com/kubernetes/kubernetes/issues/122028
We have also found data races in the statefulset controller which only occurs when you have thousands of statefulsets.
Overall, if you stay on the beaten path k8s reliability is good.
I've been working with rails since 1.2 and I've never seen anyone actually do this. Every meaningful deployment I've seen uses postgres or mysql. (Or god forbid mongodb.) It takes very little time with yours sol statements
You can run rails on a single host using a database on the same server. I've done it and it works just fine as long as you tune things correctly.
Can you elaborate?
I don't remember the exact details because it was a long time ago, but what I do remember is
- Limiting memory usage and number of connections for mysql
- Tracking maximum memory size of rails application servers so you didn't run out a memory by running too many of them
- Avoid writing unnecessarily memory intensive code (This is pretty easy in ruby if you know what you're doing)
- Avoiding using gems unless they were worth the memory use
- Configuring the frontend webserver to start dropping connections before it ran out of memory (I'm pretty sure that was just a guess)
- Using the frontend webserver to handle traffic whenever possible (mostly redirects)
- Using IP tables to block traffic before hitting the webserver
- Periodically checking memory use and turning off unnecessary services and cronjobs
I had the entire application running on a 512mb VPS with roughly 70mb to spare. It was a little less spare than I wanted but it worked.
Most of this was just rate limiting with extra steps. At the time rails couldn't use threads, so there was a hard limit on the number of concurrent tasks.
When the site went down it was due to rate limiting and not the server locking up. It was possible to ssh in and make firewall adjustments instead of a forced restart.
Thank you.
And there is truth to that. Most deployments are at that level, and it absolutely is way more performant then the alternative. it just comes with several tradeoffs... But these tradeoffs are usually worth it for deployments with <10k concurrent users. Which Figma certainly isn't.
Though you probably could still do it, but that's likely more trouble then it's worth
(The 10k is just an arbitrary number I made up, there is no magic number which makes this approach unviable, it all depends on how the users interact with the platform/how often and where the data is inserted)
Maybe - people seem really gungho about serverless solutions here too
The hype for serverless cooled after that article about Prime Video dropping lambda. No one wants a product that a company won’t dogfood. I realize Amazon probably uses lambda elsewhere, but it was still a bad look.
Yes, you could say that. :)
I think it was much more about one specific use case of lambda that was a bad fit for the prime video team’s need and not a rejection of lambda/serverless. TBH, it kind of reflected more poorly on the team than lambda as a product
not probably, their lambda service powers much of their control plane.
I’ve been struggling to square this sentiment as well. I spend all day in AWS and k8s and k8s is at least an order of magnitude simpler than AWS.
What are all the people who think operating k8s is too complicated operating on? Surely not AWS…
The thing you already know tends to be less complicated than the thing you don't know.
I think "k8s is complicated" and "AWS is even more complicated" can both be true.
Doing anything in AWS is like pulling teeth.
The sum is complex, specially with the custom operators.
ruby on rails is all you need
There are also a lot of cog-in-the-machine engineers here that totally do not get the bigger picture or the vantage point from another department.
Agreed, we're a small team and we benefit greatly from managed k8s (EKS). I have to say the whole ecosystem just continues to improve as far as I can tell and the developer satisfaction is really high with it.
Personally I think k8s is where it's at now. The innovation and open source contributions are immense.
I'm glad we made the switch. I understand the frustrations of the past, but I think it was much harder to use 4+ years ago. Now, I don't see how anyone could mess it up so hard.
(Apologies if this is a dumb question) but isn't Figma big enough to want to do any of their stuff on their own hardware yet? Why would they still be paying AWS rates?
Or is it the case that a high-profile blog post about K8S and being provider-agnostic gets you sufficient discount on your AWS bill to still be value-for-money?
It's a fair question.
Data centers are wildly expensive to operate if you want proper security, redundancy, reliability, recoverability, bandwidth, scale elasticity, etc.
And when I say security, I'm not just talking about software level security, but literal armed guards are needed at the scale of a company like Figma.
Bandwidth at that scale means literally negotiating to buy up enough direct fiber and verifying the routes that fiber takes between data centers.
At one of the companies I worked at, it was not uncommon to lose data center connectivity because a farmer's tractor cut a major fiber line we relied on.
Scalability might include tracking square footage available for new racks in physical buildings.
As long as your company is profitable, at anything but Facebook like scale, it may not be worth the trouble to try to run your own data center.
Even if the cloud doesn't save money, it saves mental energy and focus.
This is a 20-years-ago take. If your datacenter provider doesn't have multiple fiber entry into the building with multiple carriers, you chose the wrong provider at this point.
There’s a ton of middle ground between a fully managed cloud like AWS and building your own hyperscaler datacenter like Facebook.
Renting a few hundred cabinets from Equinix or Digital Realty is going to potentially be hugely cheaper than AWS, but you probably need a team of people to run it. That can be worthwhile if your growth is predictable and especially if your AWS bandwidth bill is expensive.
But then you’re building on bare metal. Gotta deploy your own databases, maybe kubernetes for running workloads, or something like VMware for VMs. And you don’t get any managed cloud services, so that’s another dozen employees you might need.
I work for a company making ~$9B in annual revenue and we use AWS for everything. I think a big aspect of that is just developer buy-in, as well as reliability guarantees, and being able to blame Amazon when things do go down
Also, you don't have to worry about half of your stack? The shared responsibility model really works.
No, you still do. You just replace those sysadmins with AWS dev ops people. But ultimately your concerns haven't gone down, they've changed. It's true you don't have to worry about hardware. But, then again, you can use coloco datacenters or even VPS.
There are a lot of ex-Dropbox people at Figma who might have learned firsthand that bringing your stuff on-prem under a theory of saving money is an intensely stupid idea.
Well, that's one hypothesis.
Another is that "Every maturing company with predictable products must be exploring ways to move workloads out of the cloud. AWS took your margin and isn't giving it back." ( https://news.ycombinator.com/item?id=35235775 )
There must be a prohibitively expensive upfront cost to buy enough servers to do this. Plus bringing in all the skill that doesn't exist that can stand up and run something like they would require.
I wonder if as time goes on that skill to use hardware is dissappearing. New engineers don't learn it, and the ones that slowly forget. I'm not that sharp on anything I haven't done in years, even if it's in a related domain.
They are almost certainly not paying sticker prices. Above a certain size, companies tend to have bespoke prices and SLAs that are negotiated in confidence.
They are preparing for next blog post in a year - „how we cut costs by xx% by moving to our own servers”.
A valuation is a just headline number which have no operational bearing.
Their ARR in 2022 was around $400M-450M. Say the infra budget at a typical 10% would be $50M. While it is a lot of money, it is not build your hardware money, also not all of it would be compute budget. They also would be spending on other SaaS apps like say Snowflake etc to special workloads like with GPUs, so not all workloads would be in-house ready. I would surprised if their commodity compute/k8s is more than half their overall budgets.
It is lot more likely to slow product growth to focus on this now, especially since they were/are still growing rapidly.
Larger SaaS companies than them in ARR still find using cloud exclusively is more productive/efficient.
Companies like Netflix with bigger market caps are still on AWS.
I can imagine the productivity of spinning up elastic cloud resources vs fixed data center resourcing being more important, especially considering how frequently a company like Figma ships new features.
Much bigger companies use AWS for very practical well thought out reasons.
Not managing procurement of hardware, upgrades, etc, and a defined standard operating model with accessible documentation and the ability to hire people with experience, and have to hire less people as you are doing less is enough to build a viable and demonstrable business case.
Scale beyond a certain point is hard without support and delegated responsibility.
Kubernetes is the most amazing piece of software engineering that I have ever seen. Most of the hate is merely being directed at the learning curve.
No, k8s is shit.
It's only useful for the degenerate "run lots of instances of webapp servers running slow interpreted languages" use case.
Trying to do anything else in it is madness.
And for the "webapp servers" use case they could have built something a thousand times simpler and more robust. Serving templated html ain't rocket science. (At least compared to e.g. running an OLAP database cluster.)
Could you please bless us with another way to easily orchestrate thousands of containers in a cloud vendor agnostic fashion? Thanks!
Oh, and just in case your first rebuttal is "having thousands of containers means you've already failed" - not everyone works in a mom n pop shop
Read my post again.
Just because k8s is the only game in town doesn't mean it is technically any good.
As a technology it is a total shitshow.
Luckily, the problem it solves ("orchestrating" slow webapp containers) is not a problem most professionals care about.
Feature creep of k8s into domains it is utterly unsuitable for because devops wants a pay raise is a different issue.
What aspects are you referring to?
professional as in True Scotsman?
No, I mean that Kubernetes solves a super narrow and specific problem that most developers do not need to solve.
I truly wish you were right, but maybe it's good job security for us professionals!
The majority of folks, whether or not they admit it, probably do...
Does this meet your definition of madness?
https://openai.com/index/scaling-kubernetes-to-7500-nodes/
Yeah, they basically spent a shitload of effort developing their own cluster management platform that turns off all the Kubernetes functionality in Kubernetes.
Must be some artifact of hosting on Azure, because I can't imagine any other reason to do something this contorted.
How much hands on time do you personally have with Kubernetes?
I agree with respect to admiring it from afar. I've gone through large chunks of the source many times and always have an appreciation for what it does and how it accomplishes it. It has a great, supportive community around it as well (if not a tiny bit proselytizing at times, which doesn't bother me really).
With all that said, while I have no "hate" for the stack, I still have no plans to migrate our container infrastructure to it now or in the foreseeable future. I say that precisely because I've seen the source, not in spite of it. The net ROI on subsuming that level of complexity for most application ecosystems just doesn't strike me as obvious.
Not to be rude, but K8s has had some very glaring issues, especially early on when the hype was at max.
* Its secrets management was terrible, and for awhile it stored them in plaintext in etcd. * The learning curve was real and that's dangerous as there were no "best practice" guides or lessons learned. There are lots of horror stories of upgrades gone wrong, bugs, etc. Complexity leaves a greater chance of misconfiguration, which can cause security or stability problems. * It was often redundant. If you're in the cloud, you already had load balancers, service discovery, etc. * Upgrades were dangerous and painful in its early days. * It initially had glaring third party tooling integration issues, which made monitoring or package management harder (and led to third party apps like Helm, etc).
A lot of these have been rectified, but a lot of us have been burned by the promise of a tool that google said was used internally, which was a bit of a lie as kubernetes was a rewrite of Borg.
Kubernetes is powerful, but you can do powerful in simple(r) ways, too. If it was truly "the most amazing" it would have been designed to be simple by default with as much complexity needed as everybody's deployments. It wasn't.
I can only speak for myself, and some of the reasons why K8s has left a bad taste in my mouth:
- It can be complex depending on the third-party controllers and operators in use. If you're not anticipating how they're going to make your resources behave differently than the documentation examples suggest they will, it can be exhausting to trace down what's making them act that way.
- The cluster owners encounter forced software updates that seem to come at the most inopportune times. Yes, staying fresh and new is important, but we have other actual business goals we have to achieve at the same time and -- especially with the current cost-cutting climate -- care and feeding of K8s is never an organizational priority.
- A bunch of the controllers we relied on felt like alpha grade toy software. We went into each control plane update (see previous point) expecting some damn thing to break and require more time investment to get the cluster simply working like it was before.
- While we (cluster owners) begrudgingly updated, software teams that used the cluster absolutely did not. Countless support requests for broken deployments, which were all resolved by hand-holding the team through a Helm chart update that we advised them they'd need to do months earlier.
- It's really not cheaper than e.g. ECS, again, in my experience.
- Maybe this has/will change with time, but I really didn't see the "onboarding talent is easier because they already know it." They absolutely did not. If you're coming from a shop that used Istio/Argo and move to a Linkerd/Flux shop, congratulations, now there's a bunch to unlearn and relearn.
- K8s is the first environment where I palpably felt like we as an industry reached a point where there were so many layers and layers on top of abstractions of abstractions that it became un-debuggable in practice. This is points #1-3 coming together to manifest as weird latency spikes, scaling runaways, and oncall runbooks that were tantamount to "turn it off and back on."
Were some of these problems organizational? Almost certainly. But K8s had always been sold as this miracle technology that would relieve so many pain points that we would be better off than we had been. In my experience, it did not do that.
What would be the alternative?
Truthfully, I don't know. But I suspect I'm not the only one who feels a kind of debilitating ennui about the way things have gone and how they continue to go.
Unrelated: What does _TFA_ mean here? Google and GPT didn't help (even with context)
The Featured Article.
(or, if you read it in a frustrated voice, The F**ing Article.)
Related acronyms: RTFA (Read The F**ing Article) and RTFM (Read The F**ing Manual). The latter was a very common answer when struggling with Linux in the early 2000s...
I don't get the hate even if you are a small company. K8s has massively simplified our deployments. It used to be each app had it's own completely different deployment process. Could have been a shell script that SSHed to some VM. Who managed said VM? Did it do its own TLS termination? Fuck knows. Maybe they used Ansible. Great, but that's another tool to learn and do I really need to set up bare metal hosts from scratch for every service. No, so there's probably some other Ansible config somewhere that sets them up. And the secrets are stored where? Etc etc.
People who say "you don't need k8s" never say what you do need. K8s gives us a uniform interface that works for everything. We just have a few YAML files for each app and it just works. We can just chuck new things on there and don't even have to think about networking. Just add a Service and it's magically available with a name to everything in the cluster. I know how to do this stuff from scratch and I do not want to be doing it every single time.
if you don't need High Availability you can even deploy to a single node k3s cluster. It's still miles better than having to setup systemd services, an Apache/NGINX proxy, etc. etc.
Yep, and you can get far with k3s "fake" load balancer (ServiceLB). Then when you need a more "real" cluster basically all the concepts are the same you just move to a new cluster.
Even on a small project it's actually better imo than tying everything to a platform like netlify or vercel. I have this little notepad app that I deploy to a two node cluster in a github action and its an excellent workflow. The k8s to get everything deployed, provision tls and everything on commit is like 150 lines of mostly boilerplate yaml, I could pretty easily make it support branch previews or whatever too. https://github.com/SteveCastle/modelpad
One thing I learned when I started learning Kubernetes is that it is two disciplines that overlap, but are distinct none the less:
- Platform build and management - App build and management
Getting a stable K8s cluster up and running is quite different to building and running apps on it. Obviously there is overlap in the knowledge required, but there is a world of difference between using a cloud based cluster over your own home made one.
We are a very small team and opted for cloud managed clusters, which really freed me up to concentrate on how to build and manage applications running on it.