There is a very obvious fix for surprise billing. Enforce a billing cap and terminate service if it's met. Even better if you send alerts when the cap is approaching.
If I pay $39/month, a default cap should be $39 per-month. Otherwise, let me set a cap I am comfortable with.
Surprise billing is never good for customers, only the business.
I promise, you are not the first person to have thought of this, and, believe it or not, there are reasons other than malice and avarice that cloud providers don't terminate service based on billing caps. Terminating service is a big deal.
We agree completely about surprise billing.
The only thing in cloud that's worse than a surprise bill is a surprise outage.
Depends. For my hobby project that hasn't been monetized, being charged $100k is way worse than it going down.
If it were critical infrastructure, or monetized in a way that brought in revenue to cover the charge, then maybe I don't want it to shut down despite skyrocketing costs, but that's hardly the only situation you could be in.
Nobody is going to charge you $100k.
(ignores the multiple hobbyists who accidentally got charged $100k)
My belief, and you can correct me on this if you have evidence to the contrary, is that nobody actually pays those bills.
Someone paid those bills to somebody. AWS has costs incurred from it (support, opportunity, etc) and they paid them.
From that point of view, the cap seems more like a common-sense self defense feature that the cloud provider would want to implement. But we have cloud providers in this thread saying they don’t want to implement caps, so, I dunno.
From my technical understanding from a few friends who may or may not know what they are talking about, the caps issue has to do with the processing delay between billing events being emitted and understood. That by the time the billing events have been processed and action taken, the damage would already be done in all but the most extreme cases.
Secondly, the best solution is to simply stop everything. But now the customer has to cold-start their entire infrastructure, which may actually cost more than paying the bill.
Thirdly, it is likely that customers will set a billing limit and then forget about it years later. Suddenly, they've got a complicated infrastructure setup spanning the globe. They finally hit a scale where they hit their billing limit that they had completely forgotten that Bill configured in their early days (who doesn't even work there anymore). Suddenly, the entire global infrastructure is shut down in the middle of the night.
That's the gist of what my friends said.
I'm just going to refer people to this comment.
Getting charged is not the same as paying those charges. I'm willing to bet real money that all who, in good faith, went and asked not to pay those bills ended up not paying them (if they didn't ask, that's a separate issue).
Perhaps that is true, but the stress and anxiety from seeing 100K bill is real and has an impact.
https://www.forbes.com/sites/sergeiklebnikov/2020/06/17/20-y...
If you're stressed over lighting $100k on fire, then you're not the target market. /s
Just in case you don't know, the person you're replying to is the author of the blog post, and they mention in it that being stressed out by (potential) unexpected large bills is something they are aware of.
Although even then, "we will only threaten to charge you $100k without meaning it" isn't much of a reassurance.
But more importantly: just let the customer decide! Let them decide whether there's a threshold where an outage is less costly than the hosting, and what that threshold is
This made me think. There is usually some "hard" cost limit X that you cannot / don't want to afford, so terminating the service is preferable.
There is also usually some "soft" limit Y < X that you don't want to exceed, and don't plan to exceed, but you'd rather pay >Y than face an outage.
But a hard limit would have to be set to X to avoid that outage, and if it gets exceeded, you'll face a bill of X and an outage.
So what a customer would actually need is to specify both X and Y, with the rule: If the cost would exceed X, then terminate it early so the cost doesn't actually exceed Y.
Sounds complicated to implement, but then, the current practice of waiving the bill is complicated too if you tried to formalize it.
(For the sake of this discussion, I'm ignoring all the technical difficulties of terminating a high-availability service at all.)
It would, of course, be just mean and unscrupulous for a cloud vendor to look at the number you have set as being ‘the absolute most I am willing to pay for this service’ and then optimize their pricing offer to you specifically to make sure they go right up to that line and no further.
I didn't mean to imply that. A hard limit at X that stays inactive <X, and at >X, leaves you with a bill of X and an outage is the easiest approach from a technical side: Terminate the service when X is reached, and bill exactly what was provided. It is something you would instantly come up with when asked to implement a cost limit, and you don't for a second put yourself in the customer's position.
Of course cloud vendors do put themselves in the customer's position, and that's why they say that customers would not be happy with a limit, even though they are asking for it.
Is this a soft limit or a trajectory prediction? I think there isn’t such a thing as a soft limit. Nobody wants to spend any money really right? But you need to spend some to avoid losing service. That’s just a cost you don’t like but need to pay.
I definitely get the idea of: I don’t want to spend X so if it looks like I will, terminate service at Y. But I think that’s a special case of the general situation, I want to know how much I’m on track to spend, right?
But I don’t know much about this at all. My whole experience was accidentally getting my own personal self a $500 AWS charge and then deciding they cloud services were dumb.
I don't know. I just tried to frame the problem from a customer's point of view, because cloud vendors' statement that customers would not like a limit is (IMHO) limited by their POV. Customers do want a limit, but not the way that cloud vendors would implement it. I think a huge part of the problem is understanding what exactly it is that you need when you use a cloud service. (This is varying from customer to customer, and from service to service, of course. You usually have important services that must be running, and others where an outage would be unpleasant but not critical.)
That is not the issue. From a customer's POV, I would be ready to spend extra to keep the service running, but there is a limit where I'd prefer an outage because I can't bear that much. There are two problems with that: First, the limit is blurry. Second, a simple hard limit would leave me with a huge bill AND an outage. I would want to be able to choose one of those evils, not be left with both. And these two problems compound.
I don't think they are dumber than the alternative. If you run your own hardware, you have a hard limit in both cost and computing power. You could technically get that with the cloud too, but it is not usually offered because it doesn't really solve the problem, but neither does it for for your own hardware.
That said, it would be nice if the major clouds would offer a "hard limit" option, but it really only works for "unimportant" applications that are cost-sensitive and can take an outage.
My preferred option would be to have an optional billing cap that I can enable knowing full well that if it is exceeded the service would be terminated (obviously with notifications as that cap is approached). I could then apply this to simple hobby projects and such, while not having the risk of termination apply to more serious applications (though a 'soft cap' would be nice here so that I could still receive notifications as it approaches).
As a certain kind of user, you probably do think that. But I also think I should be able to have a root level admin account without MFA. The consensus is that no, that should not be up to the customer.
It's different here, sure, but the providers optimize for not letting customers shoot themselves in the foot, and remediation via bill forgiveness is a fine solution -- from the provider POV.
I don't have MFA on my root level account, is it because my account is 16 years old or so at this point? Like my personal AWS account is tied directly to my "order more dish soap" amazon account, because that's how it worked back then, i guess.
You should be able to have a root level admin account with no 2FA! I would print mine and keep it in a tamper-evident envelope in a safe at my lawyer's office with instructions for when and who can get it.
A company isn't liable if their customer gets themselves hacked because they decided to not use any of the many MFA options available to them and neither is a company liable if the customer set a billing limit rule that they executed correctly.
Companies can simply not be trusted to tell the difference between a foot-gun and a..whatever a good kind of gun would be...
aws has billing alerts that trigger lambdas.
Will stop all ec2 instances.
The real fix is scoping credentials on aws - if you don’t use an account or role with limited permissions then even if they had this toggle the first step in an attack would be to disable this option.
Having flashbacks to the time where we had paid for a server and were paying for rack space for a customer and they were refusing to pay their bill. Our lawyers told us in no uncertain terms that turning off the server would be a terrible idea. “Obstruction of service” is the term that comes to mind.
Cloud squatting?
Node eviction
Surely it depends on the type of contract. Prepaid SIM cards stop working the second you run out of credit.
While the parent point about cloud providers having arrived at their policies thoughtfully, this particular issue is likely not part of the equation. There are plenty of services that run on a quota system (chat gpt, sentry, etc). There is a difference between shutting off a service the customer reasonably expected to be always on and shutting off a service when it reaches a threshold set by the customer as part of their purchase. The former is more like repossessing a physical good willy-nilly if the customer misses a payment or you find a check bounces…you can’t do that.
Okay, what are the reasons?
Big spike of real traffic and your site / app / database / system goes down.
Only if you chose to configure an unreasonably low cap.
Terminating service is a big deal for commercial customers' production environments.
Reversibly (i.e. shut down compute, don't delete anything, allow the customer to review, fix and reinstate quickly) terminating service is a minor annoyance for hobby/experimental setups, and in those, it's much more preferable than having to open a support ticket to deal with a massive bill.
Having quotas that the customer can increase themselves (but has to manually choose to increase) on storage prevents storage related surprise bills, and the rest you can shut down (optionally, make the user choose up front what they would prefer).
What am I missing? Too many commercial customers picking "experimental" initially and forgetting to change it?
No one really cares enough about hobby developers as a customer segment to rebuild the billing infrastructure to make this possible. Scalable billing at huge scale is solved at the cost of latency and being "eventually correct" (unless it has changed). To add a price cap feature, eventually correct isn't enough and then you ask yourself who would actually use it and you have to scroll really far down your list of biggest customers until you get to someone who wants it.
The problem with not caring about hobby developers is that it means developers won't be as familiar with your cloud environment when time comes to pick one for the next "real" project.
I would also expect a price cap feature to be useful for experimental/no-approval-required projects at work. In fact, if I ran a cloud project for work as a small team-internal project, a cost explosion would become an even bigger bureaucratic nightmare than if it happened at home.
Let's not sugar coat it.
The problem is 100% technical. Detecting unexpected charges, scaling and restriction in real time is hard.
It's easier to just charge people money than come up with good ways to avoid charging them, and deal with edge cases as a manual process.
Sure. I get it. What company has an internal team that's like "ooo... lets find ways to cap the amount of money people pay us".
No one.
That's why.
Right.
It's just avarice. There's no other reason.
For many company's use cases (make EC2s a few times a year, otherwise leave them running) I would imagine they would really feel better if there was an option that said "if charges incurred is over (configurable threshold you set to 10k higher than your normal monthly spend), then robocall the customer, and if they do not reply after 3 automated calls then block all additional AWS API calls that would incur more than 10% of my monthly spend if left running for 24h."
This would still allow all production services to run, but would stop someone from spinning up 200 crypto miners. I'm sure AWS is capable of implementing this, and I don't want to say it's "easy" but I would be shocked if they lacked the technical expertise to do this.
We’ve now gone from hard billing caps to “soft billing with alerting”.
If you look at lots of these threads you’ll see that many people don’t want to provide phone numbers, lots of people ignore emails, even repeated ones, directly to them, from billing.
This isn’t a technical problem, it’s a service problem. I can see the hn posts already “my site went viral and <HOST> shut me down”
Here's the thing. We have apis for everything and their grandmother. You create an instance and there are apis for adding tags, labels, nicknames...but not for spending caps? I understand I don't know all of the complexities involved, but if you can bill by the second or by the hour, you can certainly alert by the same metrics.
We have been measuring CPU, MEM with extreme granularity, how about considering price as a resource and measuring the same way, so that a service with a price cap can self manage and self terminate according to some priority field?
This might not be the actual solution, but we have been at this for a very long time, seems like there is not even a hint of an attempt at solving it by the giants. This is about incentives, sorry.
I was pleasantly surprised when I was messing around with the Google maps API and found I was able to adjust a quota to put an upper cap on daily spend.
It made me feel much more comfortable hacking around and not needing to worry that I'd accidentally create a render loop or something that could rack up a bill whilst I wasn't looking
When you build an app for resiliency you end up with classes of service where the app fails in stages.
But to extend that to the billing case, you’d have to have a partnership with your customers, not just a dashboard where they push buttons and an API where you add or delete machines.
Maybe the website goes read only except for admin traffic when the budget is exceeded, for instance. Not as a bespoke process each company has to reinvent, but as functionality provided by the vendor.
When you're paying $39/month for something that generates $0/month, that is a very sensible policy.
When you're paying $50,000/month for something that generates $200,000/month in value, or if an outage can generate $100,000/month in costs, or if the people that can fix an outage cost $100,000/year, then it's not.
Then that company's threshold for "sensible monthly cost" would be a lot higher. 500k? 1m? Give them the option to set something.
Why would you prefer this when the alternative is no downtime and the provider forgiving the bill?
That eventually will be factor into the price like credit card fraud insurance. Better have it be more transparent.
Because you don't want to be relying on the service provider's whim in "forgiving" the bill?
Yes, and it shouldn't be too hard to implement at the business side. Deno just announced it: https://deno.com/blog/deploy-spend-limits
(on their Pro plan)
I'm not talking it down. Maybe people are right about this. We'll see.
Because data storage costs money, hard billing caps require deleting both your data and its backups to stay under the hard cap. There are very few use cases where that's actually acceptable, not even development environments where people will get upset for losing their work that they haven't pushed to somewhere else yet.
It's easy for (say) AWS to terminate your EC2 instances. Do they also delete your DB backups? Delete your S3 buckets?
All of these incur costs. How hard a cap do you want?
Yes, but I imagined that you’d have a ceiling for rolling average / token bucket over a sensible time window, with two limits:
- a soft alert limit, which you set to the threshold of “hmm something is wrong but we’ll bear the cost until we figure it out”
- a hard limit which fails until more tokens trickle back in, without shutting down service
I really don’t want to rely on forgiveness, it’s just encouraging reckless behavior and submitting to the incomprehensibility of cloud pricing.
Everyone wants these limits, why not design products with that in mind from the get go? It feels like such an afterthought