An alternative to sophisticated cloud cost minimization systems is…….. don’t use the cloud. Host it yourself. Or use Cloudflare which has 0 cents per gigabyte egress fees. Or just rent cloud servers from one of the many much cheaper VPS hosting services and don’t use all the expensive and complex cloud services all designed to lock you in and drain cash from your at 9 or 12 or 17 cents per gigabyte.
Seriously, if you’re at the point that you’re doing sophisticated analysis of cloud costs, consider dropping the cloud.
If you're at the point you're doing sophisticated cloud cost analysis you are doing the cloud right, because that is completely impossible anywhere else.
I swear the people who say go on premise have no idea how much the salary costs of someone who will not treat their datacenter like a home lab is. Even Apple iCloud is in AWS and GCP because of how economical it is, you suck at the cloud you think you have to go back on prem, or you just don't give a shit about reliability (start pricing up DDoS protection costs at anything higher than 10G and tell me the cloud is more expensive).
We spend 100k+ on AWS bandwidth and it's still cheaper than our DIA circuits because we don't have to pay network engineers to manage 3 different AZs.
Apparently we're doing the impossible for over 12 years now. Who knew?
Some people act like it's some kind of black magic. It's not. We've some customers in our DC and some on AWS for various reasons. AWS isn't less problematic. AWS is about 10x more expensive. Both on prem and cloud require people familiar with them and cloud-engineers are in no way cheaper.
Only meaningful problem is that on-prem requires some up front cost&time. That can be mitigated by leasing and other means, but indeed can be an issue for small businesses.
Cloud clearly makes sense for:
- small business with at least some reliability expectation, and little to none IT expertise
- huge workload requirement volatility
- having someone else to blame
- solution is already working in cloud, with teams being very comfortable there, and perceiving on-prem as “enemy” (analogy: forcing devs to rewrite stuff from haskell to java)
- that extra cost is small budget line for you
On the other hand, it does not make sense to go cloud, if you are sufficiently big and already have on prem solution and expertise in house. (Extreme case: google, does not use aws for its main load; this upper threshold I wager is couple of orders of magnitude smaller)
Amazon retail runs on AWS, and I think we can agree that Amazon retail is reasonably described as “large scale”
This is a quirk of the business that is Amazon and AWS, as they started by selling excess compute and expertise, due to how Amazon was built as API first internally it was almost natural.
It’s in no way the norm for a smaller business.
This matches the public (i.e., non-Amazon) speculation I was hearing around the launch of S3 and later EC2. But not what I was hearing internally when I worked at Amazon. I was there when S3, the first AWS service, and EC2 were launched. I was working on what I believe to be the first Amazon (non-AWS) application that used S3 for storage. Getting that approved was not easy - all the same skepticism existed internally as externally (cost, availability, durability, security, etc.).
The story I was hearing internally was that it was too costly to scale infrastructure the way Amazon had been doing it, it was fragile, and the expertise wasn't keeping up with growth. So, set the bar a lot higher, and build infra that is big enough and flexible enough to be everybody's infra, and then Amazon's applications (e.g., retail) could run on AWS' excess capacity. Literally the opposite of what external folks were guessing. I believe they were completely physically separate data centers - even the physical location of AWS data centers were on a need-to-know basis internally (the internal lore was "under a mountain in Virginia" - this was years before Regions and Availability Zones). And any bugs in AWS could be worked out with outside usage before moving Amazon's applications onto it.
Also, Amazon needed the elasticity of AWS because of the nature of their retail business. At the time that the initial AWS services were being developed, a massive chunk of Amazon's traffic came during the holiday season. IIRC, something like half of the year's traffic and revenue, possibly more, came in November/December each year. That meant a lot of capacity was sitting idle most of the year. Selling that excess capacity would mean shutting AWS down every holiday season.
For a time, there was an internal mailing list that wasn't yet locked down that contained reports on S3 bandwidth usage. The growth rate was shockingly high. I would guess that within a year or two of release, S3 was using a few (at least) orders of magnitude more bandwidth than everything else at Amazon combined.
In broad strokes, the main point I was makings till stands though: AWS was deliberately made to back the demanding scale of Amazon, it was a bet on the future and the Amazon model as much as it was product service, and that did mean they built expertise and hardware up and sold that as a product none the less.
This still isn't the norm for most businesses, even big ones.
Is that really a fair comparison though? AWS is a very weird argument to make because you could say that AWS is kind of “on premise” for Amazons purposes. Internally Amazon.com does not pay retail pricing or have the same level of support as third party end users. A better example would be looking at Jet.com/Walmart and asking if it runs on AWS.
Nobody who is big is paying retail prices, that's why saying "on premise is cheaper" is total copium.
As soon as you start factoring in discounts (i.e. bandwidth is nearly free at some point), the math of being on premise completely falls apart to the point you are paying more for licensing and support for the hardware than you are the entire lifecycle of your infrastructure in the cloud. It's just that bad to do it yourself.
Sorry, but who pays for licensing and support of the hardware? I've never done that. You buy a Dell server or whatever, you pay for the 5 or 7 years warranty up front, you put in in a rack and you never literally touch it again until it's EOL'ed. If something breaks Dell touches it, not you. That typically costs around $500/year, albeit paid upfront.
But you usually don't do that. Instead you rent a dedicated metal from someone. The cost if of the order of $1000/year including some storage, no more to pay unless you exceed 100TB/year. A similarly configured AWS EC2 instance is $13,000/year, plus bandwidth, plus whatever other services you get sucked into. And you will get sucked into it because if you ask AWS about any problem (like say, monitoring why your bills are so high), the answer is invariably "use this paid service of ours".
You're kidding yourself if you think using AWS is cheaper than the alternatives. Those discounts you speak of are from an absurdly high starting point. I'm sure there are lots of reasons to use AWS or a similar cloud service, but unless you only need a lot of grunt for at most a few months price isn't one of them.
Cloud doesn't make sense for small business. A vps would. If you are spending less than 100,000 you probably don't need it for your 10,000 million or less daily visitors
If you're spending less than 100,000, you almost certainly aren't spending enough to pay salary + benefits for a sysadmin.
Why would you need additional people to manage a vps? The person managing amazon can very easily use cpanel because it is much simpler.
Why does everyone always jump to this fiction?
You probably aren't paying a "cloud engineer" just to fiddle with cloud config full time with <$100k spend. Why would you suddenly need a full time sysop to do the same capability level of on-prem? If you do 10 hours of "cloud engineering" a month to support an application, the same capability level of on-prem work is probably in the same ballpark. 5-30 hours. Yes, it can be lower than cloud. No, it's NOT suddenly 160 hours every month. Yes, it does mean you need someone who can wear the extra hat, cloud engineering skills are literally no different in this regard.
Internet sites ran on basement and closet servers for years, actual server rooms for larger stuff. The jokes were tripping over power cords and backhoes digging up internet lines were the causes of most outages. It's never been easier to run on-prem than today. It cost a fraction of even budget VPS providers like Hetzner.
For certain classes of applications, it's awesome. It's not everyone's cup of tea I'll grant. But if you are inclined to play with hardware or have someone in the org who does, and an extra 2-4 hours of downtime a year isn't that big of a deal (depending on general utility and network available at your chosen site). You can save tons of money.
Depending how small is small business, but I would probably go with VPS.
Not a good example in my opinion, as Google is also a major provider of cloud services. AWS isn't the only game in town.
I think Disney+ is a more interesting case. A quick google search turns up some articles saying their video streaming is powered by external CDNs. As I understand it, Netflix take the opposite approach and deliver all video data from their own Open Connect CDN, although they use AWS for other workloads (presumably including things like authentication, their recommendation engine, etc).
Both on prem and cloud require people familiar with them and cloud-engineers are in no way cheaper.
I think the real story is a bit sordid: office politics. On-prem and cloud are different skillsets. Companies that have been around for a while can end up with both on-prem and cloud experts who end up competing with each other, often on separate teams. Throw in some slick consultants from Amazon who are able to bend the ear of the VP and you've got a real problem. From what I've seen it doesn't end well for the on-prem team!
Cloud engineers can do the job of 4-5 on-prem people. Our AWS devs don't need to be BGP or ZFS experts, they just need to be AWS experts.
Hilariously ironic, with a sufficiently large cloud footprint, things like BGP (and more/other internetworking protocols) and OpenZFS become required skillsets. I have firsthand experience of this. :)
Yeah, there was an amusement there. I've definitely had to understand BGP to configure cloud VPC setups.
"They just have to be an AWS expert".
Right. They just have to be experts in: EC2, S3, Aurora, DynamoDB, RDS, Lambda, VPC, LightSail, Athena, EMR, RedShift, MQ, SQS, SNS, ECR, ECS, EKS, ElastiCache, CloudWatch, CloudTrail, IAM, Cognito, and a few more. No big deal.
Well our on-prem team doesn't need AWS pricing calculation and optymization expert, so there's that :-)
AWS pricing and optimization is just capacity planning, which doesn’t go away if you run on prem - it just looks different, with longer time horizons & financial implications.
“Will my data center run out of floor space & I need to expand?” (years+)
“Will I have enough cooling & power to support the new racks we need?” (6 months+)
“When do I need to get the server order out to ensure we meet our capacity needs?” (6+ weeks)
Every one of those are capital expenditures, so line them up with the annual budget cycle - be sure to keep enough spare capacity to be responsive for last minute asks.
Don’t think my intent is to romanticize the cloud, either. It’s not better, nor worse, just a different way to manage things.
Of course if your company is sufficiently small, do whatever you know and can do quickly - customer acquisition will be more important than debating the cost of either infra in aws or a colo’d server or two in some racks somewhere. But the complexity doesn’t go away if you go to the cloud, OR if you are all on prem. TINSTAAFL.
I concur. Many VPs bend ears so easily you got to wonder. do they also get invited to some private dinner by those so called slick consultants, who pay the bill in a rush and leaves after forgetting some thick envelope on the table.
It’s a story as old as time itself. IBM has been doing it at least as far back as the 60’s. Fancy consultants who know the tech and also know how to sell and make themselves seem way smarter than the VP’s reports. Do one slick presentation and the VP is asking his team “why didn’t you guys come up with this stuff?”
Next thing you know these multi-million-dollar contracts are signed and the existing teams are just shaking their heads. The smart ones have already put out their resumes and started interviewing elsewhere.
I'm not one of those smart ones. So many things I was seeing for years make much more sense now.
It's an old story but I guarantee you many don't know about it.
These types of debates always seem to go this way - one person saying one option works way better for them so the other option must be crazy, followed by another person saying the opposite. I'm not an infra guy, but my guess is the reality is both are choosing the right option for themselves, because is no one objectively "best" option. Just tradeoffs.
I also think if a company is unhappy with their current set up and thinks that switching from cloud to on-prem (or vis versa) is the answer, they're probably delusional because "the fault, dear Brutus, is not in our stars, but in ourselves."
Completely not my point. I don't say that using cloud is crazy(we're using it sometimes). I'm saying that opinion that on-prem is much harder than AWS is IMO wrong.
Requires some planning ahead of time - yes. Harder - not really.
In many organizations the biggest appeal of services like AWS or GCP isn’t simply cost, it’s that the service is approved and therefore all the services of that service are approved and I no longer have to justify spinning up more compute or leveraging one of their more bespoke services like SQS (or wherever equivalent). It’s all just there, ready to be used.
It may not be true for you and where you work but this is a very real thing in a lot of organizations, where development teams want a quicker turn around on booting up services they need as the product evolves and having some control over how they act with each other. It (sometimes to negative results) opens up more architectural possibilities to solve problems
I'm not saying using AWS is universality bad. Depends on your needs.
What I'm saying is some people are trying to portrait rolling your own k8s or on-prem as equivalent to rolling your own crypto - better left to the chosen ones with years of training in secret monastery. This is BS :-)
My small business once spend 50k per month on AWS. We brought that back to 800 dollars for a similar setup at Hetzner. I find this a significant number.
"Yea this small VPS provider with 1% of the features is just as good as AWS to us" yea that's because you aren't using features as basic as AWS Nitro Enclaves and you are years behind even basic cloud security.
Hetzner is for running homelabs and basic compute, not businesses. That's why EU companies constantly ignore EU rulings on US-EU privacy shield because there's just no alternative to American cloud providers yet.
People have been running real businesses before the cloud providers existed. Of course you can run a business with rented dedicated servers.
Of course you can run your business on it, you will just suffer because there is practically 0 automation to it. I guess if you are a small business but we are very obviously not talking about mom and pop shops who need a web and email server.
Incorrect, there's automation for on-prem or smaller providers as well, just because you're unable to doesn't mean it's the truth.
How do you think AWS or Cloudflare is built?
You're arguing semantics, you are on massive copium if you think that automation is useful to anyone. Go ahead and set up cryptographic attestation (or just try and interface with a TPM ffs) for your apps on Hetzner to decrypt customer data and see how impossible it is.
Sorry but what kind of wubble wobble talk is this. Hardly anyone needs the stuff you refer to.
It's useful to a lot of people, not everyone has the same requirements. I'm running SecureBoot and LUKS encryption on my Laptop. If there's a TPM2 interface in whatever you can use the same thing. Add an immutable Linux distro and Kubernetes and you'll cover pretty many use cases. Not everyone has the same requirements, $BIGCLOUD makes things easier, but you also pay top dollar for it. AWS funds Amazon
My business is running on Hetzner. And many others as well.
Reliability differs, right?
I have hetzner vms going on 500+ DAYS without restart. I've been a hetzner customer for about 8 years now. Aws has been down x10 times more than hetzner has. And even when hetzner has issues they are localized to a single DC at worst, most often a single host is down. When aws has issue.. The whole internet has issues. Or their central region goes out, also can affect their other regions.
For the small/medium businesses infra I manage here in bulgaria , the same thing would cost 5-10 times as much on aws just for the compute, throw in 1tb of bandwidth.. And this makes no financial sense.
On hetzner I pay 40 euros for 2 vms, dedicated ips, daily backups, 100gb of ssd external storage, and firewall.
I have more than 10 servers on Hetzner (some dedicated, some VPS), for 5+ years and the same experience, once one of the dedicated servers had some hardware issues, and an hour later the drives were moved to another box and it was running again. Other than that time, I had downtime only because of my own fault.
Pretty sure over these years AWS has experienced a lot more issues overall.
What's your Hetzner setup look like? 98% cost savings makes me curious...
What does this mean? They steal company resources for themselves, or just configure things incompetently?
Incompetence. Take my friend’s company for instance. They were frustrated paying $60K/mo to Amazon so their brilliant sysadmin bought $600K of servers and moved them into a cheap colo.
Over Christmas, everything died, and the brilliant sysadmin was on holiday. Nobody could get things going again for many days and so their entire SaaS business was failing. They lost a lot of business and trust as a result.
The sysadmin is now gone and they are back on AWS.
No key person risk management -> no risk register -> no management. Your friends company will fail regardless of poor sysadmin decision making or not. They need to hire competent management ASAP.
This is basically the logic of people who say the cloud is too expensive, you have to ignore so many things to make being on premise logical. Basically you are lying to yourself if you think you can run a datacenter cheaper and better than Amazon or Microsoft can, because if you can you are just making huge sacrifices somewhere (usually time, which is why reddit sysadmins complain about how much work they have while defending being on-premise because they couldn't possibly be wrong).
you must be management, cuz
1. you think it's the sysadmin's fault
2. there are no competent sysadmins out there
what magical things do they have? that every single reasonably sized enterprise doesn't have? it should be extremly easy for a small enterprise to beat any of the main clouds* - they make crap ton of profit from you
*making an assumption that your needs are reasonably static and not MASSIVELY busting up and down your infrastructure
Faint ISO 27001 sounds in the background
With cloud and SaaS services you are paying to reduce person risk profile.
Your forming a larger dependency on a team lead against a custom system that now is a liability as new people come to the organization don't want to adopt an abandoned poorly understood project.
This story has nothing to do with AWS or on-prem.
It's a story about incompetent management allowing a single human point of failure. If they don't change that, they'll have the same problem wherever they go.
If you are fully accounting for vacation, training, sleep etc then you need a minimum of 5 admins for mission critical services. Now, you can engineer around this to reduce your staffing requirement but I wouldn't recommend going under 2 ever because accidents happen.
This business seemed one below that, without the engineering, and I would point to the mgmt, not the brilliant admin as the problem.
Non-scalable incompetence or basically pretending that the datacenter will never go down. Any high schooler with an iPhone can set up and maintain a datacenter full of servers.
But if you want something reliable that I can spend 30 seconds writing some terraform for, it will take an entire infra team to set up and maintain it, not to mention an entire procurement process and now having to integrate a new supply chain just for a basic multi-az setup (probably without things like backups and still without basic features the cloud gives you automatically).
You act like these problems are especially hard.
Active-active, five nines, fault tolerance. Hard stuff. But managing on-prem is no harder.
This is what we're paid for.
"this is what we're paid for" Nope, it's what YOU'RE paid for.
I am paid to relax on my holidays because I know my team and I don't have to drive to a colo to swap out a failing line card since I realized time is worth money and people quit jobs that take up too much of their time. I can A/B test (something on-prem guys NEVER get the luxury to do) so outages just don't happen at all (fingers crossed).
I have rarely met someone happy with their on-prem DC deployments, but after I moved to the AWS world it's just crazy how backwards it is to be anywhere but the cloud.
It's almost as if you feel the cloud is something magically different, it's actually just servers on racks owned by someone else.
You can own the same thing if you want and do everything exactly the same.
(See e.g. Oxide)
Anyone serious wouldn't "drive to the colo to swap out a failing line card" they keep have excess capacity and spares in the colo, and have the on-site personnel from the facility replace it.
Honestly just sounds like the environment you describe has greater organizational issues not related to on prem vs cloud.
That feels really grumpy.
Compare something like rocketry or chemical engineering with running an on-prem DC. I don't see what the complaining is about. It's still a luxury compared to what other professions have to deal with.
there's so much to unpack here!!
Managing on-prem hardware may not necessarily be hard, but it can be extremely time consuming. To me, the nice thing about dropping a bunch of hardware in a colo is you get to take a lot of shortcuts and take risks that you cannot buy from the public cloud providers.
I worked for a company and would do it again that did the colo route, and it gave immense cost savings compared to public cloud, taking on risks that you can't do elsewhere. Before they started investing in having folks take care of the infra as a raw startup, it was just some servers and some desktop unmanaged switches. But that gave the company breathing room to survive as the business model probably didn't work without it. But also had a reputation for unreliable service.
I've also built the five nines infra at telcos, and yes you can do it with average engineers, but it's going to be time consuming, slow, and expensive in costs and labor. To allow 26 seconds of unplanned outage a month, you're going to be testing every firmware update for every piece of equipment on an ongoing basis, and practicing every operation and change as best as possible. And you need the scale that you get that 26s by having most outages only impact a subset of your customer base, otherwise you're going to blow that outage budget fast.
Managing on prem is definitely harder because you are benefiting from the economics of scale of all the management problems that you have to pay yourself, and if you don't have scale then you will be significantly overpaying to get the same type of quality, reliability, or responsiveness.
Most people are not paid to manage infra, they are paid to talk to customers, ship features, fix bugs, and other "core business" items; just like most businesses don't build roads, they pay taxes and utilize them because the cost of doing it themselves for their preferred traffic patterns would be much more than they could justify (for now.)
On-prem is massively harder if you can’t cut corners on security or reliability. Just things like testing & upgrading firmware, doing real DR testing (I know multiple places which spent lots of time and money doing annual failover tests, but went down hard every time they had a true failure due to something they’d missed), handling things like boot signing or secure logging, etc. all take up multiple FTEs worth of time, or are a checkbox from a platform which handles that for you.
I haven’t yet meet anyone at a company that heavily uses cloud and still doesn’t have the same number of salaried infra people as on-prem.
AWS and GCP are giving companies like Apple huge discounts so someone could say something like, "Even Apple iCloud is in AWS and GCP because of how economical it is"
There is too much nuance to say one is better than the other. In some cases using a IaaS is more economical, in other cases it's not.
For Apple, the same is also true[0] to say "Even Apple is running their own datacenters because of how economical it is"
0 - https://dgtlinfra.com/apple-data-center-locations/#:~:text=a....
Which would mean that you loose part of the reason to use the cloud in the first place... A lot of org move to cloud based hosting because it enable them to go way further in FinOps / cost control (amongst many other thing).
This can make a lot of sense depending on your infra, if you have some fluctuation in you needs (storage, compute, etc...), cloud based solution can be a great fit.
At the end of the day, it is just a tool. I worked in places where I SSH´d into the prod bare-metal server to update our software, manage the firewall, check the storage, ... and all that manually. And I worked in places where we were using a cloud provider for most of our hosting needs. I also handle the transition from one to the other. All I can say is: It's a tool. "The cloud" is no better or worse than a bare-metal server or a VPS. It really depends on your use-case. You should just do your due diligence and evaluate the reason why one would fit you more than the other, and reevaluate it from time to time depending on the changes in your environment.
This whole "cloud bad" is just childish.
I think a lot of orgs move to cloud simply because it's popular and gartner told them so.
But taking a step away from that, it's really about self-service. When the alternative is logging a ticket for someone to manually misconfigure a VM and then fail to send you the login credentials, then your delivery is slow.
When you're chasing revenue, going slow means you're leaving money on the table. When you're a big bureaucratic org, it means your middle managers can't claim to have delivered a whole bunch of shit. Nobody likes being held up, but that's what infrastructure teams historically do.
Nah, I think it's mostly about the second part of your comment. Everyone hates waiting for months to get a VM or a database or a firewall rule because the infrastructure/DBA teams are stuck ten years in the past and take pride in their artisanal infrastructure building.
So moving to the cloud eliminates a useless layer of time wasting,
If your on-prem team can't spin up a VM same day, then firing them is probably higher ROI than "going to cloud". Further, a lot of the shops "going to cloud" because their infra team is slow.. then hide cloud behind their infra team.
A prior 200+ dev shop went from automated on-prem VM builds happening within hours from when you raise a ticket, to cloud where there was a slack channel to nag&beg for an EC2 which could take a day to a week. This was not a temporary state of affairs either, it was allowed to run like this for 2 years+.
Oh and, worth mentioning, CTO there LOVED him some Gartner.
I've never seen an IT team that couldn't spin up a VM in minutes. I have seen a bunch of teams that weren't allowed to because of ludicrous "change control" practices. Fire the managers that create this state of affairs, not the devops folks, regardless of whether you "go cloud" or not.
I've met multiple customers where time to get a VM was in the weeks to months. (To be fair, I'm at a vendor that proposed IaC tooling and general workflows and practices to move away from old school ClickOps ticket-based provisioning, so of course we'd get those types of orgs).
And more often than not, it had nothing to do with managers, but with individual contributors resisting change because they were set in their ways and were potentially afraid for their jobs. Same applies for firewall changes btw.
I think a lot of HN crowd hangs out at FAANG/FAANG adjacent or at least young/lean shops, and has no idea how insane it is out there.
I was at a shop that provisions AWS resources via written email requests & clickops, treated fairly similar to a datacenter procurement. Teams don't have access to the AWS console, cannot spin up/down, stop, delete, etc resources.
A year later I found out that all the stuff they provisioned wasn't set up as reserved instances. We weren't even asked. So we paid hourly rates for stuff running 24/7/365.
This was apparently the norm in the org. You have to know reserved instances exist, and ask for them.. you may eventually be granted the discount later. I only realized what they had done when they quoted me rates and I was cross checking ec2instances.info I can guarantee you less than 20% of my org (its not a tech shop) is aware this difference exists, let alone that ec2instances.info exists for cross reference.
No big deal, just paying 2x for no reason on already overpriced resources!
I went from that type of world (cell carrier) to a FAANG type company and it was shocking. The baseline trust that engineers were given by default was refreshing and actually a bit scary.
I’m not sure my former coworkers would have done well in an environment with so few constraints. Many of them had grown accustomed to (and been rewarded and praised for) only taking actions that would survive bureaucratic process, or fly underneath it.
Teams are what they DO, not what they CAN DO.
Ok, but I’m not sure what that has to do with what I posted.
The problem is the strong players are less likely to stick around, so you often do end up with folks who can't do the work in minutes - though, the work is usually slightly more than clicking the "give me the vm" button.
Despite years of friendly sounding devops philosophy there's times when devs and ops are fundamentally going to be in conflict. it's sort of a proxy war between devs who understandably dislike red tape and management who loves it, with devops caught in the middle and on the hook for both rapid delivery of infrastructure but also some semblance of governance.
An org with actual governance in place really can't deliver infra rapidly, regardless of whether the underlying stuff is cloud or on prem, because whatever form governance takes in practice it tends to be distributed, i.e. everyone wants to be consulted on everything but they also want their own responsibility/accountability to be to be diluted. Bureaucracy 101..
Devs only see ops taking too long to deliver, but ops is generally frozen waiting on infosec, management approving new costs, data stewards approving new copies across ends, architects who haven't yet considered/approved whatever Outlandish new toys the junior devs have requested, etc etc.
Depends on exactly what you're building but with a competent ops team cloud vs on prem shouldn't change that much. Setting aside the org level externalities mentioned above, developer preference for stuff like certain AWS apis or complex services is the next major issue for declouding. From the ops perspective cloud vs on prem is largely gonna be the same toolkit anyway (helm, terraform, ansible, whatever)
Yes, of course management is often the problem.
I think it helps when people actually take a step back and understand where the money that pays their salary comes from. Often times people are so ensconced in their tech bureaucracy they think they are the tail that wags the dog. Sometimes the people that are the most hops from the money are the least aware of this dynamic. Bureaucracies create an internal logic of their own.
If I am writing some internal software for a firm that makes money selling widgets, and I decide that what we really need is a 3 year rewrite of my app for reasons, am probably not helping in the sale or the production of widgets. If another team is provisioning hardware for me to write the software on, and it now takes 2 weeks to provision virtual hardware that could take seconds, then they are also not helping in the sale or the production of widgets.
These are the kind of orgs that someone may one day walk into, blast 30% of the staff, and find no impact on widget production, and obvious 30% savings on widget costs...
Well in this example, the ops team slowing down pointless dev work by not delivering the platform that work is going to happen on quickly are effectively engaged in costs savings for the org. The org is not paying for the platform, which helps them because the project might be canceled anyway, and plus the slow movement of the org may give them time to organize and declare their real priorities. Also due to the slow down, the dev and the ops team are potentially more available to fix bugs or whatnot in actual widget-production. It's easy to think that "big ships take a while to turn" is some kind of major bug or at least an inefficiency, but there are also reasons orgs evolve in that direction and times when it's adaptive.
Part of my point is that, in general, departments develop internal momentum and resist all interface/integration with other departments until or unless that situation is forced. Structurally, at a lot of orgs of a certain size, that integration point is ops/devops/cloud/platform teams (whatever you call them). Most people probably can't imagine being held responsible for lateness on work that they are also powerless to approve, but for these kind of teams the situation is almost routine. In that sense, simply because they are an integration point, it's almost their job to absorb blame for/from all other departments. If you're lucky management that has a clue can see this happening, introduce better processes and clarify responsibilities.
Summarizing all that complexity and trying to reduce it to some specific technical decision like cloud vs on-prem is usually missing the point. Slow infra delivery could be technical incompetence or technology choices, but in my experience it's much more likely a problem with governance / general org maturity, so the right fix needs to come from leadership with some strong stable vision of how interdepartmental cooperation & collaboration is supposed to happen.
Whilst often true in practice, this doesn't have to be true.
The reality is, a lot of these orgs have likely already discovered devops, pipelines, deployment strategies, observability, and compliance as code.
There's basically little in compliance that can't be automated with patterns and platforms, but in most of these organizations a delivery teams interface with the org is their non-technical delivery manager who folds like a beach chair when they're told no by the random infosec bod who's afraid of automation.
I've cracked this nut a few times though. It requires you be stubborn, talk back, and have the gravitas and understanding to be taken seriously. i.e. yelling that's dumb doesn't work, but asking them for a list of what they'd check, and presenting an automated solution to their group, where they can't just yell no, might.
They probably should be fired, but it's actually complicated because the orgs tend to be staffed with departments that believe this is the way things should be done, and best case the replacement needs to compromise with them, worse case they are like minded and you just get more of the same.
I haven’t seen this be due to one set of incompetents since the turn of the century. What I have seen is this caused by politics, change management politics, and shortsighted budgetary practices (better to spend thousands of dollars per day on developers going idle or building bizarre things than spend tens on infrastructure!).
In such cases, the only times where firing someone would help would be if they were the C-level people who created and sustained that inefficient system.
It also allows management to hide bad decisions and poor planning.
Project is a dud? just nuke the cloud project and no more charges for it.
Project is poorly architected and running like a dog? throw more resources at it.
Both of the above are harder to hide when you have to order equipment for on prem.
How is that a negative? Not every project is going to be successful. That's just a basic fact of life. That you don't have to deal with the sunk cost fallacy and just pull the plug is a good thing.
Another positive...?! You can continue to serve your clients and maintain a revenue stream while you work on a better architecture. Instead of failing completely. And once you need less ressources, you easily scale down.
If you're running an internal cloud, you can likely absorb that.
I think comes down to a couple of things:
- Small orgs don't have the resources to run internal clouds, nor should they be doing so. This limits the pipeline of available candidates. - Large orgs promote the wrong people to management, and they make decisions based on their mental model of the world that was developed 20 years ago. They're filled with people who don't understand the difference between cloud and virtualization. - Large consultancies make more money by throwing raw numbers at the problem rather than smart automation. i.e. it's easier for IBM to bill T&M and a whole project wrapper to patch the server than automate it. - Finance & HR teams want you to bend to their ways of working rather than the opposite.
Of the rest, you get into many of them are simply in ops because they're less skilled software developers, or they're now being asked to assure security, and that scares them so they try to lock everything down.
If only that were the case.
Even when everything is in IaC and 100% cloud-native, I’ve still seen dev teams bypass the approved methods because ClickOps is easier.
That time wasting is back in the same form or another. For instance, at my side they enable *all* the Azure Application Gateway, even those rules that Microsoft says not to enable - causing even simple OpenID redirects from Microsoft AzureID (Microsoft login) to the application to get captured in AAG and fail.
You still have to go through your devops (or equivalent) team to make any network configuration/permission changes. Whether that change is implemented by a local firewall rule or some AWS configuration change is not very important.
It's not like you're going to have developers changing AWS access permissions directly. Maybe in a few employee startup, but in any regulated & audited company, you must have separation of duties and audited change control process.
The layer will still be there, because those teams are now managing some cloud infrastructure central to the organization.
This can be rational and not just following the leader. In particular.. many devs might think that working with an org that does On-prem is bad for their career, and they might be right. So from an org POV you can't hire good engineers if you're perceived as a dinosaur. This actually might be enough to send you towards the cloud even if the price by itself makes no sense
I've experienced the opposite too: orgs looking at down on cloud-only devs.
The idea being that devs who lean on cloud excessively do that to masks their lack of fundamentals, which will cause costly fuck ups no matter what technology they use, cloud or on-prem.
Maybe directionally similar idea to hiring ex-Googlers? Some orgs also don't like those. Specific mindset, specific toolbox.
It is absolutely true that some devs have the AWS product set as the tech toolkit they know best.
Whatever their fundamental skills are, the most important way they add value is by optimizing things like lambda startup time or EC2 CPU utilization. Does this allow them to mask deep problems with fundamentals? I guess it could, but that sounds a bit gatekeep-y to me.
Sort of, but IDK, If you have specific needs this might be a somewhat reasonable heuristic for hiring.
Devs who came up building software more or less from scratch really do have a different skillset than ones who stick to working in service-rich environments because there's a significant difference between glueing services together vs building out those same services. For example something like using a paginated API is quite a bit easier than designing/implementing one. A developer who is skilled and methodical about reading and understanding service-level documentation may not actually be able to step through debugging in a REPL, and vice versa. (Not to say that either kind of person cannot learn the other persons tricks, but as far as the differences in what they already know, those can be pretty significant.)
Assuming someone only has one of these skillsets, the most valuable one totally depends on the situation. On the one hand it's pretty cool that service-familiarity tends to be language-agnostic, but it's less cool when your S3-API expert barely understands the basics of tooling in the new language.
Paginated api is a great example. For me, I learned C from K&R and producing a.out files that would default and leave a core file in $HOME. If I wanted a list structure, I had to build it out of resizable arrays of pointers, etc.
I ended up years later at AWS, and while I was there I built internet-facing paginated APIs over resources which had a variety of backing stores, each of which was had some behavior I had reason about.
So I don’t doubt the difference between API builder and API user, I’ve been both. I think it’s less about what you are doing and more about how you do it (with curiosity about how things work, vs. as an incurious gluer).
That said, looking at the code inside MySQL is highly instructive for the curious; AWS doesn’t provide that warts-and-all visibility into their implementations, which cuts off the learning journey through the stack.
There's also now regulatory capture. Your bare-metal or VPS solution won't be "FEDRAMP" approved, even though there are fewer moving parts to secure.
I have worked in places where adding a new server is a bureocratic nightmare at best.
Granted I dont think thats the norm, but also host your webserver yourself is not as user friendly as AWS.
People always forget that.
Not childish…. it’s a growing line of thought in the IT community that has bought the cloud sell unquestioningly for 20 years.
But it's the same mindset as "cloud good", which was also a growing line of thought once. Mantras aren't useful; tradeoff analysis is useful.
Mantras are good for orgs that are not mature enough to do actual analysis. A lead developer left recently where I work and while it was likely higher pay that was the biggest decision to move, I suspect the real reason he left is higher ups simply don't listen when he says things like you can't just take full virtual machines on azure, refuse any rewrite/redesign while complaining about high azure spend.
AWS is infamous in financial services for this though.
First they give you a ton of credits, assign you internal resources to help.
Then they encourage you to simply "lift and shift" your workloads onto EC2/EBS/EFS/etc. It's 100% compatible with your current system, you can rollback, etc. This take two years, then you notice your AWS bill is 10x your old infra.
Then they say - of course, that's because you need to rewrite it all to serverless/microservices/etc that are all AWS bespoke branded alphabet soup of services. Now you are fully entrapped, and can not rollback to your own infra, let alone another cloud provider without another rewrite.
A lot of big financial firms are 5+ years into this. Several have rolled back for certain use cases due to cost, especially anything with a lot of data transfer because yeah.. performant storage in the cloud & egress are expensive, duh.
You can still use standard stuff like Kubernetes, even if you go microservices. I don't think it's that bad.
I'd say Cloud lets you do a few things, but the way I think of it ultimately is it lets you spend opex instead of capex. If that means though that your opex will end up higher than your capex, then it would be silly to go with it.
The other thing is in theory your reliability should be higher, but, again, that will depend on your individual situation, and how much reliability matters to you.
You CAN, but of course that's not what AWS steers you to.
Once your org has gotten to that step, it's been so steered by AWS staff it's hard to imagine suddenly finding sense and building with open standard stuff. Very few AWS shops I have encountered avoid the siren call of various AWS-only or AWS-specific services, which they then become heavily ingrained in..
Generally I do think its mostly about transforming CapEx to OpEx, with the rest of the stuff being noise.
I was one of those AWS people that worked with Financial Services customers.
We (at least my team) were always pushing for a minimum of modernization when architecting migration projects - even a simple move to containers, managed by whatever orchestrator they want to run. It helps enormously on costs at least by just taking off so many overhead and overprovisioned VMs from the roster of migration candidates.
More often than not, the customer will refuse that and opt for lift + shift. It's either too hard, they don't have the resources or time, etc.
Yep I did a (tiny small by your scale) lift and shift+ - basically take a thing that ran as a regular process with a database attached, and containerised (and pen tested, but that's another story) the thing and plugged it into a cloud-managed database. It worked great.
Well, then they can flip a coin for which mantra to follow. If you pick "cloud bad" you'll get stories also about companies that refused to go to cloud when it makes sense to.
The GP said "[if X happens] consider dropping the cloud". Which is totally different from a mantra.
There is virtually nobody saying "cloud bad" without nuance.
Oh how I wish it had been so. The cloud has been a hard sell all along. Also, 20 years ago S3 and EC2 didn’t exist, so maybe it’s been a little less time than that.
The cloud is better and worse than bare metal. It depends on the use case.
AWS is Kafkaesque though
I have had the same experience. And all this, even though Amazon has granted us really generous and free annual plans and professional advice (all inclusive for a non-profit GmbH).
There is a massive difference between "[if X happens] consider dropping the cloud" and "cloud bad".
Which VPS services don't charge you at all for bandwidth?
you'd be harder pressed finding ones that do (before a reasonable limit).
tilaa.com
vultr.com
hetzner.com
linode.com
While they might not charge you directly as a line item, you still get charged: Linode in the above list is what I use. I get a fixed cap of bandwidth each month. Anything beyond that is charged. So, you don't get charged IF you stay below the initial cap.
Exactly this. They're all charging for bandwidth.
If anything you could say that AWS is the closest to actually having unlimited bandwidth (or at least, half-parity there) since they don't charge you for incoming data, where other VPS charge you for data both ways.
Really which has more or less expensive bandwidth comes down to the shape of your data usage.
Which is another way of saying you buy a lump sum of bandwidth included with your VM.
The issue is not paying for bandwidth, the issue is the insane pricing.
https://www.scaleway.com/en/
No, they charge for bandwidth. https://www.scaleway.com/en/dedibox/network/bandwidth/
https://www.netcup.eu/vserver/#root-server-details (with fair use policy)
OVH (except for APAC data centers), I think Scaleway also has bandwidth included.
If you're actually using the features of cloud - i.e. managed services - then this involves building out an IT department with skills that many companies just don't have. And the cost of that will easily negate or exceed any savings, in many cases.
That's a big reason that the choice of cloud became such a no brainer for companies.
Next you’ll tell me that full-stack is a lie, and devs don’t actually know how to run a DB.
Full stack is a lie because devs don't build their own hardware.
I don’t think it’s that ridiculous (the next obvious goalpost being producing your own silicon), but since microservices means your DB is purely for your service, it stands to reason that you should also then know how to configure, backup, maintain, and tune the DB.
This is of course a wildly unrealistic ask, which is why I think the idea of merging jobs to save money is stupid. Let people who like frontend do frontend. Let people who like DBs do DBs. Let people who tolerate YAML do DevOps.
Maybe I don't want to manage a db, I just want to run one.
I don't think "host yourself" in this instance would've helped. I think AWS in this instance is operating at a loss. Author found a loophole in AWS pricing and that's why it's so cheap. Doing it on their own would've been more expensive.
Now as to why the AWS pricing the way it is... we may only guess, but likely to promote one service over the other.
AWS is surely operating at a loss for this particular case (zero S3 charge).
But there is no way that cross-AZ traffic costs AWS anywhere near what they charge for it.
also, if you really need to transfer hundreds of TB's of data from South Africa to the US, put it on a palette of disks and send it via bulk shipping.
then load the data into the datacentre and then just pay for the last sync / delta.
https://aws.amazon.com/snowcone/
This makes no sense.
So what do I do if I'm at the point where I'm doing sophisticated analysis of on-prem costs? Do I move to the cloud?
Those VPS hosting services are a solution for a startup but not to run your internet banking, airline company or public sas product. Plus their infinite egress bandwidth and data transfer claims are only true, as long as you don't...Use infinite egress bandwidth and data transfer....
There are also many bare metal and VPS providers that charge radically less for bandwidth or even charge by size of pipe rather than transfer.
Cloud bandwidth pricing is… well the best analogies are very inappropriate for this site.
The tipping point for self-hosting would be the point at which paying for full-time, 24/7 on-call sysadmins (salary + benefits) is less than your cloud hosting bill.
That's just not gonna happen for a lot of services.
The blog post's solution is relatively simple to put in place ; if you're already locked-in to AWS, dropping it will cost quite a lot and this might be a great middle ground in some cases.