I like a lot of the answers, but something else I'd add: lots of "popular" architectures from the late 00s and early 2010s have fallen by the wayside because people realized "You're not Google. Your company will never be Google."
That is, there was a big desire around that time period to "build it how the big successful companies built it." But since then, a lot of us have realized that complexity isn't necessary for 99% of companies. When you couple that with hardware and standard databases getting much better, there are just fewer and fewer companies who need all of these "scalability tricks".
My bar for "Is there a reason we can't just do this all in Postgres?" is much, much higher than it was a decade ago.
Is that also why almost no one is using microservices and Kubernetes?
Kubernetes brings more than just being Google. In a way it’s also an ecosystem.
That‘s nice, but the question is if you (i.e. your company) needs this ecosystem.
Need no. But it’s nice especially since I have the know how.
Creating a new deployment is just super fast. Developers can deploy their own apps etc.
And then there is Helm.
If we only decide by “need” then most of the time we also wouldn’t need object oriented programming.
Right, who doesn't want to template hundreds of lines of code in a language that uses whitespace for logic and was never made neither for templating nor complex long documents(YAML)? What could possibly go wrong ("error missing xxx at line 728, but it might be a problem elsewhere").
But my infrastructure is code! Can't you see how it's all in git?
What's wrong with having your infrastructure as code and storing it in Git?
YAML isn't code. Same as your reply, YAML has very little awareness of context.
In computer science, the word "code" has a very specific meaning, and markup languages definitely fit this definition.
Nothing. Even if it's objectively terrible (thousands of lines of templated YAML, or thousands of lines of spaghetti bash), being in Git as code is still better. At least you know what it is, how it evolved, and can start adding linting/tests.
I manage large IaC repos and it's mostly HCL, well-structured and easy to work with. Where we have Kubernetes manifests, they're usually split into smaller files and don't cause any trouble as normally we usually don't deploy manifests directly.
"in a language that uses whitespace for logic"
This argument kind of died when python became one of the most popular programming languages used.
Yeah, but Python has actual for loops.
It’s not completely that white space is bad, but in particular that white space is very difficult to template relative to something character-delimited like JSON or s expressions.
In JSON or s expressions, the various levels of objects can be built independently because they don’t rely on each other. In YAML, each block depends on its parent to know how far it should be indented, which is a huge pain in the ass when templating.
Python uses whitespace for scoping - not for logic. That said, the same is true for YAML.
Honestly, Helm kinda sucks (for all the reasons you mentioned).
But Kustomize is very nice. Although, their docs could be a bit better.
I don’t like helm itself. But I was referring to the deployment part. I like Kustomize more.
I wonder why people don't use fromYaml + toJson to avoid stupid errors with indent.
Yaml is for all intents and purposes a superset of JSON so if you render your subtree as JSON you can stick it in a Yaml file and you don't need to care with indentation.
It’s not that bad if you need to deploy at least 3 things and for most cases it beats the alternatives. You can get away with a bootstrapped deployment yaml and a couple of services for most scenarios. What should you use instead? Vendor locked app platforms? Roll out your own deploy bash scripts?
Sure the full extend of Kubernetes is complicated and managing it might be a pain, but if you don’t go bonkers is not that hard to use it as a developer.
I’ve only ever seen a single dev team managing their own K8s cluster. If by deploy you mean “they merge a branch which triggers a lot of automation that causes their code to deploy,” you don’t need K8s for that.
Don’t get me wrong, I like K8s and run it at home, and I’d take it any day over ECS or the like at work, but it’s not like you can’t achieve a very similar outcome without it.
Of course. And I can achieve a Webserver in C. Doesn’t mean it’s the best way given the circumstances.
There are many ways in tech to achieve the same result. I don’t understand why people constantly need to point that out.
I also don’t understand why k8s ruffles so many feathers.
Reminds me a bit of Linux vs windows vs Mac debates.
K8s ruffles my feathers because it’s entirely too easy to build on it (without proper IaC, natch) without having any clue how it works, let alone the underlying OS that’s mostly abstracted away. /r/kubernetes is more or less “how do I do <basic thing I could have read docs for>.”
I’m a fan of graduated difficulty. Having complex, powerful systems should require that you understand them, else when they break – and they will, because they’re computers – you’ll be utterly lost.
Lot's of things are nice to have but are expensive.
I'd love to have a private pool in my backyard. I don't, even though it's nice to have, because it is too expensive.
We intuitively make cost-benefit choices in our private lives. When it comes to deciding the same things at work, our reasoning often goes haywire.
Sometimes we need an expensive thing to solve a real problem.
Your point about object-oriented programming makes sense. Sometimes, a bash script suffices, and the person who decides to implement that same functionality in Java is just wasting resources.
All of these solutions have a place where they make sense. When they are blindly applied because they're a fad they generate a lot of costs.
That’s true. Using k8s to host a static website would be silly.
Generally I only use it when I see a compelling case for it and the introduced complexity takes away complexity from somewhere else.
Genuine question: say you have 3-4 services and a bunch of databases that make up your product, what's the alternative to plemping them all into K8s according to you?
3-4 services and a bunch of databases?
Assuming there aren’t any particular scaling or performance requirements, if I were managing something like that, I would almost certainly not use k8s. Maybe systemd on a big box for the services?
I agree with you and I'm always confused when people talk about process isolation as a primary requirement for a collection of small internal services with negligible load.
In addition the overhead and reporting drawbacks of running multiple isolated databases is vastly higher than any advantaged gained from small isolated deployments.
For personal stuff I simply run systemd services and that does scale quite a lot (as in, you can rely on it for more production services) I believe.
My hero.
If you use AWS, it's probably easier to use ECS that takes away some of the complexity from you.
Maybe at first, but once you start building all of the IaC and tooling to make it useful and safe at scale, you might as well have just run EKS. Plus, then you can get Argo, which is IMO the single best piece of software from an SRE perspective.
If I had 3-4 services and a bunch of databases, I would look at them and ask "why do we need to introduce network calls into our architecture?" and "how come we're using MySQL and Postgres and MongoDB and a Cassandra cluster for a system that gets 200 requests a minute and is maintained by 9 engineers?"
Don't get me wrong, maybe they're good choices, but absent any other facts I'd start asking questions about what makes each service necessary.
In my home environment I run a VM with docker for that.
In a commercial environment I’d still use kubernetes. But maybe something like k3s or if we are in a cloud environment something like EKS.
Usually with time other services get added to the stack (elastic, grafana, argocd, …)
Docker compose is another option.
Using cloud platform as a service options. For example, on Azure you can deploy such system with Azure App Service (with container deployment), or Azure Container Apps (very suitable for microservices). For database, you can use Azure Database for PostgreSQL (flexible), or Azure Cosmos DB for PostgreSQL.
This way, Azure does most of the heavy lifting you would otherwise have to do yourself, even with managed kubernetes.
AWS Fargate is popular among large companies in my experience.
Some of them try to migrate from it to a unified k8s "platform" (i.e. frequently not pure k8s/EKS/helm but some kind of in-house layer built on top of it). It takes so long that your tenure with the company could end before you see it through.
Define "no one". If you mean small shops, maybe. If you mean large organizations, I haven't seen even one in the last 5 years that wouldn't use them in one way or another.
It was meant to be ironic
Ah sorry, it is more and more difficult for me to detect irony these days.
Yeah sorry as well, I actually wanted to add a “… oh wait” to my original comment but forgot to do it… (too busy fixing a podman issue… )
Maybe add /s ;). It may decline number of hot headed responses
lol they gon get flamed
Yeah but you can’t deny the rofls without the /s. At least I certainly can’t deny the humor from reading DevJab’s comment.
I realize you're being sarcastic (I think), but I actually would put microservices in the same boat. There was a huge push to microservices in the mid teens, and a lot of companies came to hugely regret it. There is a reason this video, https://youtu.be/y8OnoxKotPQ, is so popular.
And it's not that "no one is using microservices", it's just that tons of companies realized they added almost as many complications as they alleviated, and that for many teams they were just way too premature. And a lot of the companies that I've seen have the most success with microservices are also the most pragmatic about them: they use them in some specific, targeted areas (e.g. authn and authz), but otherwise they're content using a well-componentized monolith where they can break off independent services later if there is an explicit reason to do so.
I don’t know of a single 100+ sized organisation in my area which doesn’t use micro services in some form. A lot of places also use kubernetes indirectly through major cloud provider layers line Azure Container Apps.
Our frontend (and indeed quite a bit of our backend) lives in a NX mono-repo. As for how it actually works, however, it’s basically a lot of micro-services which are very independently maintainable. Meaning you can easily have different teams work on different parts of your ecosystem and not break things. It doesn’t necessarily deploy as what some people might consider micro services of course. But then micro services were always this abstract thing that is honestly more of a framework for management and change management than anything tech.
We also have much much bigger single machines available for reasonable money. So a lot of reasonable workloads can fit in one machine now that used to require a small cluster
What happens when the single machine fails?
Worst case scenario you service is not available for a couple of hours. In 99% of business, customers are totally okay with that (if it's just not every week). IRL shops are also occasionally closed due to incidents; heck even ATMs and banks don't work 100% of the time. And that's the worst case: because your setup is so simple, restoring a backup or even doing a full setup of a new machine is quite easy. Just make sure you test you backup restore system regularly. Simple system also tend to fail much less: I've run a service (with customers paying top euro) that was offline for ~two hours due to an error maybe once or twice in 5 years. Both occurrences were due to a non-technical cause (a bill that wasn't payed - yes this happened, the other one I don't recall). We were offline for a couple of minutes daily for updates or the occasional server crash (a go monolith, crash mostly due to an unrecovered panic), however the reverse proxy was configured to show a nice static image with the text along the lines "The system is being upgraded, great new features are on the way - this will take a couple of minute". I installed this the first week when we started the company with the idea that we would do a live-upgrade system when customers started complaining. Nobody ever complained - in fact customers loved to see we did an upgrade once in a while (although most customers never mentioned having seen the image).
Depending on your product, this could mean tens of thousands to millions of dollars worth of revenue loss. I don't really see how we've gone backwards here.
You could just distribute your workloads using...a queue, and not have this problem, or have to pay for and pay to maintain backup equipment etc.
From the original post: “Your business is not Google and will never be Google”
From the post directly above: “Most businesses…”
The thread above is specifically discussing business which won’t lose a significant amount of money if they go down for a few minutes. They also postulate that most businesses fall into this category, which I’m inclined to agree with.
I understand it in practice but I also think it's weird to be working on something that isn't aiming to grow, maybe not to good scale but building systems which are "distributable" from and early stage seems wise to me.
Hours, not minutes. That is relevant for most businesses.
If your product going down for an hour will lead to the loss of millions of dollars, then you should absolutely be investing a lot of money in expensive distributed and redundant solutions. That's appropriate in that case.
The point here is that 99% of companies are not in that scenario, so they should not emulate the very expensive distributed architectures used by Google and a few other companies that ARE in that scenario.
For almost all companies on the smaller side, the correct move is to take the occasional downtime, because the tiny revenue loss will be much smaller than the large and ongoing costs of building and maintaining a complex distributed system.
I‘d argue that is wrong for any decently sized ecommerce platform or production facility. Maybe not millions per hour, but enough to warrant redundancy. There’s many revnue and also redundancy levels between Google and your mom and pop restaurant menu.
It could. In those cases, you set up the guardrails to minimize the loss.
In your typical seed, series A, or series B SaaS startup, this is most often not the case. At the same time, these are the companies that fueled the proliferation of microservice-based architectures, often with a single-point of failure in the message queue or in the cluster orchestration. They shifted easy-to-fix problems into hard-to-fix problems.
Machine failures are few and far between these days. Over the last four years I've had a cluster of perhaps 10 machines. Not a single hardware failure.
Loads of software issues, of course.
I know this is just an anecdote, but I'm pretty certain reliability has increased by one or two orders of magnitude since the 90s.
You didn't answer the question though. You're answer is "it won't" and that isn't a good strategy.
It is in that if something happens less often, you don't need to prepare for it as much if the severity stays the same (cue in Nassim Taleb entering the conversation).
I'm not sure what types of products you work on, but it's kind of rare at most companies I've worked at where having a backup like that is a workable solution.
Also anecdotally, I’ve been running 12th gen Dells (over a decade old at this point) for several years. I’ve had some RAM sticks report ECC failures (reseat them), an HBA lose its mind and cause the ZFS pool to offline (reseat the HBA and its cables), and precisely one actual failure – a PSU. They’re redundant and hot-swappable, so I bought a new one and fixed it.
Depending on your requirements for uptime, you could have a stand-by machine ready or you spin up a new one from backups.
Your monitoring system alerts you on your phone, and you fix the issue.
When I worked with small firms who used kubernetes, we had more kubernetes code issues that machines failing. The solution to the theoretical problem was the cause of real issues. It was expensive to keep fixing this.
It's kind of mind boggling just how powerful mundane desktop computers have gotten, let alone server hardware.
Think about it: That 20 core CPU (eg: i7 14700K) you can buy for just a couple hundred dollars today would have been supercomputer hardware costing tens or hundreds of thousands of dollars just a decade ago.
According to geekbench, an i9 4790 processor released a decade ago is ~5 times slower than i7 14700. 4790's go for $30 at ebay, vs $300 for 14700, so price/performance seems to be in favor of older hardware:)
What about power consumption? When running a server 24/7, power is likely to be a bigger cost concern than the one-off cost of purchasing the processor.
Under full load, roughly 100W for the 4790, and 350W for the 14700. Note that both links are for the K variant, and also, both were achieved running Prime95. More normal workloads are probably around 2/3 those peak values.
For a desktop, yeah, you’re generally better off buying newer from a performance/$ standpoint. For servers, the calculus can shift a bit depending on your company’s size and workloads. Most smaller companies (small is relative, but let’s go with “monthly cloud bill is < $1MM”) could run on surprisingly old hardware and not care.
I have three Dell R620s, which are over a decade old. They have distributed storage via Ceph on NVMe over Mellanox ConnectX3-PRO. I’ve run DB benchmarks (with realistic schema and queries, not synthetic), and they nearly always outclass similarly-sized RDS and Aurora instances, despite the latter having multiple generations of hardware advancements. Local NVMe over Infiniband means near-zero latency.
Similarly, between the three of them, I have 384 GiB of RAM, and 36C/72T. Both of those could go significantly higher.
Those three, plus various networking gear, plus two Supermicro servers stuffed with spinning disks pulls down around 700W on average under mild load. Even if I loaded the compute up, I sincerely doubt I’d hit 1 kW. Even then, it doesn’t really matter for a business, because you’re going to colo them, and you’re generally granted a flat power budget per U.
The downside of course is that you need someone[s] on staff that knows how to provision and maintain servers, but it’s honestly not that hard to learn.
[0]: https://www.guru3d.com/review/core-i7-4790k-processor-review...
[1]: https://www.tomshardware.com/news/intel-core-i9-14900k-cpu-r...
I think for server type workloads to get performance improvement estimate it would be reasonable to compare single core performance and multiply by the ratio of number of cores.
On the other hand, the E7-8890 v3 (the closest equivalent to a 14700K in core count at the time from a quick glance) had an MSRP of $7174.00[1].
So maybe I was a bit too high on the pricing earlier, but my point still stands that the computing horsepower we have such easy access to today was literal big time magic just a decade ago.
[1]: https://ark.intel.com/content/www/us/en/ark/products/84685/i...
The RAM also get much larger and cheaper, and it is now possible to have several terabyte (TB) of RAM memory (not storage), in a single PC or workstation. This i7 14700K can support 192 GB RAM but other lower end Xeon CPU W for workstation for example w3-2423 costing around USD350 can support 2 TB RAM albeit only 6-core [1]. But then with not so much more extra budgets you can scale the machine to your heart's content [2].
[1] Intel Xeon w3-2423 Processor 15M Cache, 2.10 GHz:
https://www.intel.com/content/www/us/en/products/sku/233484/...
[2] Intel Launches Xeon W-3400 and W-2400 Processors For Workstations: Up to 56 Cores and 112 PCIe 5.0 Lanes:
https://www.anandtech.com/show/18741/intel-launches-xeon-w-3...
This! You NEEDED to scale horizontally because machines were just doing too much. I remember when our Apache boxes couldn’t even cope doing SSL so we had a hardware box doing it on ingress!
I used to administrate a small fleet of sun sparc hosts with SSL accelerators. They were so much money $$$$.
I proposed dumping them all for a smaller set of x86 hosts running linux, it took 2-3 years before the old admins believed in the performance and cost savings. They refused to believe it would even work.
I have the same memories, trying to convince people to dump slow as hell sparc processors for database workloads in favor of X86 machines costing a 10th of the price.
To this day I still argue with ex Solaris sysadmins.
I lived through that era too, it was wild to see how quickly x86 dethroned sparc (even Intel's big misses like Itanium were only minor bumps in the road).
Those days, you had to carefully architect your infrastructure and design your workload to deal with it, and every hardware improvement required you to reevaluate what you were doing. Hence novel architectural choices.
Everything is way easier for normal sized organizations now, and that level of optimization is just no longer required outside of companies doing huge scale.
15 years ago I ran a website (Django+Postgres+memcached) serving 50k unique daily visitors on a dirt cheap vps. Even back then the scalability issues were overstated.
As the stock market prices now expected future growth, architecture had to justify rising stock prices by promising future scalability.
It was never about the actual workloads, much more about growth projections. And a whole lot of cargo cult behavior.
This was true in the 2000's and 2010's as well. A lot of the work could be handled by a single monolithic app running on one or a small handful of servers. However, because of the microservices fad, people often created complicated microservices distributed across auto-scaling kubernetes clusters, just for the art of it. It was unneeded complexity then, as it is now, in the majority of cases.
I'm not sure people realize this now more than then. I was there back then and we surely knew we would never be Google hence we didn't need to "scale" the same way they did.
Nowadays every project I start begins with a meeting where is presented a document describing the architecture we are going to implement, using AWS of course, because "auto-scale" right?, and 9/10 it includes CloudFront, which is a CDN and I don't really understand why this app I am developing, which is basically an API gateway with some customization that made Ngnix slightly less than ideal (but still perfect for the job), and that averages to 5 rps needs a CDN in front of it... (or AWS or auto-scaling or AWS lambda, for that matter)
In defense of CDNs they're also pretty neat for cutting down latency, which benefits even the first customer.
Of course that only helps if you don't end up shoving dozens of MBs in Javascript/JSON over the wire.
Putting your app behind a CDN also gives you some cheap defense against (most, casual) DDoS.
usually, but this app is not even exposed publicly to the internet
5 rps aside, the more data you can push to the edge (your customer) the cheaper it will be and the better performance for you customer.
The autoscaling is nice because a lot of performance issues just get resolved without much meddling by the ops team, buying time for proper optimizations should it get out of hand.
The disadvantage is that people don't think hard about performance requirements anymore. Premature optimization is bad, but it's also a warning sign if a project has no clue whatsoever how intensely the system is going to be used.
AWS overcharges 100x for bandwidth, and CloudFront's free tier has 10x more bandwidth in it.
To be fair, I worked on multiple projects removing queues at Google, so it's more than just that.
and mandates that virtually all new projects not directly use borg/kubernetes.
That's interesting. What's the rationale behind that?
I guess such a service gets coupled too strongly to that platform, and major engineering effort is required to deploy it the old-school way.
Can you extend it? How do they deploy, and where do they deploy their projects?
True, but the CTO comes from twitter/meta/google/some open-source big-data project, the director loves databases, etc.
So we have 40-100 people managing queues with events driven from database journals.
Everyone sees how and why it evolved that way. No one has the skill or political capital to change it. And we spend most of our time on maintenance tasked as "upgrades", in a culture with "a strong role for devops".
Meta, where they run a PHP monolith with mysql? ;)