return to table of content

Reclaim the Stack

subarctic
48 replies
18h8m

I wish _I_ had a business that was successful enough to justify multiple engineers working 7 months on porting our infrastructure from heroku to kubernetes

bastawhiz
36 replies
17h58m

Knowing the prices and performance of Heroku (as a former customer) the effort probably paid for itself. Heroku is great for getting started but becomes untenably expensive very fast, and it's neither easy nor straightforward to break the vendor lock in when you decide to leave.

internetter
12 replies
15h50m

From their presentation, they went from $7500/m to $500/m

grogenaut
5 replies
13h43m

assume a dev is $100k/year... so $200k with taxes, benes, etc. That's 16,666/month, at 1.5 months is 25k. So it'll take 3.5 months to break even. And they'd save around .8 of their pay, or .4 of their total cost a year...

Generally I am hoping my devs are working a good multiplier to their pay for revenue they generate. Not sure I'd use them this way if there was other things to do.

That said sounds like it was mainly for GDPR so.

icameron
3 replies
8h25m

Where are you finding capable DevOps engineers for 100k total comp? It’s hard to find someone with the skills to rebuild a SaaS production stack who’s willing to work for that little around here!

subarctic
1 replies
7h39m

Europe, probably

whizzter
0 replies
7h22m

Correct, mynewsdesk (that created this reclaim the stack thing) is a Swedish company.

Maxion
0 replies
7h10m

100k Eur is high salary for a dev. A unicorn who knows they are one, won't agree to that salary, but would for 150k or 200k.

p_l
0 replies
9h0m

Now consider that some places are not in Silly Valley, or not even in USA, and the fully loaded cost of engineer (who, once done with on-prem or at least more "owned" stack, can take on other problems) can be way, way lower

ttul
4 replies
15h44m

I mean, $7,000 a month isn’t nothing. But it’s not a lot. Certainly not enough to justify a seven month engineering effort plus infinite ongoing maintenance.

ramshorst
1 replies
9h35m

Would you say one person not working 100% of the time is also quite minor? ;)

Aeolun
0 replies
8h52m

Sure. We have around 10 of those. It’s a significant boon to the project for them to do nothing.

nine_k
0 replies
13h33m

This is $7k/mo today. If they are actively growing, and their demand for compute is slated to grow 5x or even 10x in a year, they wanted to get off Heroku fast.

crdrost
0 replies
14h32m

The main engineering effort to reduce by that much was completed in 6 weeks according to their YouTube video.

7 months is presumably more like “the time it has been stable for” or so, although I am not sure the dates line up for that 100%.

Also cost reduction was apparently not the main impetus, GDPR compliance was.

dbackeus
0 replies
3h9m

We've since migrated more stuff. We're currently saving more than $400k/year.

danenania
10 replies
17h20m

I find AWS ECS with fargate to be a nice middle ground. You still have to deal with IAM, networking, etc. but once you get that sorted it’s quite easy to auto-scale a container and make it highly available.

I’ve used kubernetes as well in the past and it certainly can do the job, but ECS is my go-to currently for a new project. Kubernetes may be better for more complex scenarios, but for a new project or startup I think having a need for kubernetes vs. something simpler like ECS would tend to indicate questionable architecture choices.

wg0
2 replies
12h1m

ECS is far, far far smoother, simpler and stable than anything else out there in cluster orchestration. It just works. Even with EC2 instances it just works. And if you opt for Fargate, then that's far more stable option.

I am saying this after bootstrapping k8s and ECS both.

Aeolun
1 replies
8h54m

The only pain point there I think is auto scaling logic. But otherwise it’s painless.

danenania
0 replies
3h40m

I find auto-scaling with fargate to be pretty straightforward. What's the pain point there for you?

subarctic
2 replies
7h23m

How does it compare with fly.io? Last I checked, startup time is still in minutes instead of less than a second on fly, but I presume it's more reliable and you get that "nobody ever got fired for using AWS" effect

dangus
0 replies
3h59m

Fly.io is an unreliable piece of shit.

danenania
0 replies
3h45m

Fly is really cool and it's definitely an extremely quick way to get a container running in the cloud. I haven't used it in production so I can't speak to reliability, but for me the main thing that stops me from seriously considering it is the lack of an RDS equivalent that runs behind the firewall.

chihwei
1 replies
10h48m

GCP Cloud Run is even better, which you don't have to configure those networking stuff, just ship and run in production

danenania
0 replies
3h36m

Does Cloud Run give you a private network? While configuring it is annoying, I do want one for anything serious.

hannofcart
0 replies
3h52m

+1 on this. ECS Fargate is great.

bastawhiz
0 replies
16h16m

I pretty strongly agree. Fargate is a great product, though it isn't quite perfect.

cpursley
9 replies
17h37m

Moving from Heroku to Render or Fly.io is very straight forward; it’s just containers.

ed
3 replies
14h28m

(Except for Postgres, since Fly's solution isn't managed)

Heroku's price is a persistent annoyance for every startup that uses it.

Rebuilding Heroku's stack is an attractive problem (evidenced by the graveyard of Heroku clones on Github). There's a clear KPI ($), Salesforce's pricing feels wrong out of principle, and engineering is all about efficiency!

Unfortunately, it's also an iceberg problem. And while infrastructure is not "hard" in the comp-sci sense, custom infra always creates work when your time would be better spent elsewhere.

mrweasel
1 replies
11h21m

Couldn't you move to AWS? They offer managed Postgresql. Heroku already runs on AWS, so there could be a potential saving in running AWS managed service.

It's still a lot of work obviously.

Maxion
0 replies
7h9m

So does GCP and Azure. At least in GCP land the stuff is really quite reasonably priced, too.

davedx
0 replies
12h41m

Salesforce's pricing feels wrong out of principle

What do you mean exactly? If it takes multiple engineers multiple months to build an alternative on kubernetes, then it sounds like Heroku is worth it to a lot of companies. These costs are very "known" when you start using Heroku too, it's not like Salesforce hides everything from you then jump scares you 18 months down the line.

SF's CRM is also known to be expensive, and yet it's extremely widely used. Something being expensive definitely doesn't always mean it's bad and you should cheap out to avoid it.

dartos
1 replies
17h29m

Unless you relied on heroku build packs.

debarshri
0 replies
16h40m

Buildpacks is opensource too [1]

[1] https://buildpacks.io/

bastawhiz
1 replies
17h24m

If you use containers. If you're big enough for the cost savings to matter, you're probably also not looking for a service like Render or Fly. If your workload is really "just containers" you can save more with even managed container services from AWS or GCP.

OJFord
0 replies
17h17m

We are talking about moving from Heroku, I don't think being too needy for the likes of Fly is at all a given. (And people will way prematurely think they're too big or needy for x.)

goosejuice
0 replies
16h41m

So is kubernetes. GKE isn't that bad.

antimemetics
1 replies
12h57m

I mean this is what they recommend:

- Your current cloud / PaaS costs are north of $5,000/month - You have at least two developers who are into the idea of running Kubernetes and their own infrastructure and are willing to spend some time learning how to do so

So you will spend 150k+/year (2 senior full stake eng salaries in EU - can be much higher, esp for people up to the task) to save 60k+/y in infra costs?

Does not compute for me - is the lock-in that bad?

I understand it for very small/simple use cases - but then do you need k8s at all?

It feels like the ones who will benefit the most is orgs who spend much more on cloud costs - but they need SLAs, compliance and a dozen other enterprisy things.

So I struggle to understand who would benefit from this stack reclaim.

dbackeus
0 replies
8h29m

Creator of Reclaim the Stack here.

The idea that we're implying you need 2 full time engineers is a misunderstanding. We just mean to say that you'll want at least 2 developers to spend enough time digging in to Kubernetes etc to have a good enough idea of what you're doing. I don't think more than 2 month of messing about should be required to reach proficiency.

We currently don't spend more than ~4 days per month total working on platform related stuff (often we spend 0 days, eg. I was on parental leave during 3 months and no one touched the platform during that time).

WRT employee cost, Swedish DevOps engineers cost less than half of what you mentioned on average, but I guess YMMV depending on region.

efilife
10 replies
16h7m

Fyi, we use asterisks (*) for emphasis on HN

Kiro
4 replies
13h30m

Different thing. Using visible _ is a conscious choice.

efilife
3 replies
12h36m

Why?

Kiro
2 replies
11h55m

It looks nice and has been a staple in hacker culture for decades, long before we had rich text and were just chatting on IRC.

sph
0 replies
11h17m

Also it looks like _underlined_ text

efilife
0 replies
4h12m

It doesn't look nice at all to me. Real emphasis looks way nicer, that's its purpose. Now that we have rich text, why not utilize it?

komali2
2 replies
10h7m

Who's "we?"

efilife
0 replies
3h33m

Most of HN users

willvarfar
1 replies
11h42m

underscores around italics and asterisk around strong/bold was an informal convention on bbs, irc and forums way before atx/markdown.

efilife
0 replies
4m

[delayed]

jusomg
44 replies
10h39m

Of course you reduced 90% of the cost. Most of these costs don't come from the software, but from the people and automation maintaining it.

With that cost reduction you also removed monitoring of the platform, people oncall to fix issues that appear, upgrades, continuous improvements, etc. Who/What is going to be doing that on this new platform and how much does that cost?

Now you need to maintain k8s, postgresql, elasticsearch, redis, secret managements, OSs, storage... These are complex systems that require people understanding how they internally work, how they scale and common pitfalls.

Who is going to upgrade kubernetes when they release a new version that has breaking changes? What happens when Elasticsearch decides to splitbrain and your search stops working? When the DB goes down or you need to set up replication? What is monitoring replication lag? Or even simply things like disks being close to full? What is acting on that?

I don't mean to say Heroku is fairly priced (I honestly have no idea) but this comparison is not apples to apples. You could have your team focused on your product before. Now you need people dedicated to work on this stuff.

dzikimarian
12 replies
9h31m

Sorry, but that's just ton of FUD. We run both private cloud and (for a few customers) AWS. Of course you have more maintenance on on-prem, but typical k8s update is maybe a few hours of work, when you know what you are doing.

Also AWS is also, complex, also requires configuration and also generates alerts in the middle of the night.

It's still a lot cheaper than managed service.

jusomg
5 replies
9h10m

Of course you have more maintenance on on-prem, but typical k8s update is maybe a few hours of work, when you know what you are doing.

You just mentioned one dimension of what I described, and "when you know what you are doing" is doing a lot of the heavy lifting in your argument.

Also AWS is also, complex, also requires configuration and also generates alerts in the middle of the night.

I'm confused. So we are on agreement there?

I feel you might be confusing my point with an on-prem vs AWS discussion, and that's not it.

This is encouraging teams to run databases / search / cache / secrets and everything on top of k8s and assuming a magic k8s operator is doing the same job as a team of humans and automation managing all those services for you.

Nextgrid
4 replies
9h4m

assuming a magic k8s operator is doing the same job as a team of humans and automation managing all those services for you.

What do you think AWS is doing behind the scenes when you run Postgres RDS? It's their own equivalent of a "K8S operator" managing it. They make bold claims about how good/reliable/fault-tolerant it is, but the truth is that you can't actually test or predict its failure modes, and it can fail and fails badly (I've had it get into a weird state where it took 24h to recover, presumably once an AWS guy finally SSH'd in and fixed it manually - I could've done the same but without having to wait 24h).

jusomg
3 replies
8h11m

Fair, but my point is that AWS has a full team of people that built and contributed to that magic box that is managing the database. When something goes wrong, they're the first ones to know (ideally) and they have a lot of know-how on what went wrong, what the automation is doing, how to remediate issues, etc.

When you use a k8s operator you're using an off the shelve component with very little idea of what is doing and how. When things go wrong, you don't have a team of experts to look into what failed and why.

The tradeoff here is obviously cost, but my point is those two levels of "automation" are not comparable.

Edit: well, when I write "you" I mean most people (me included)

re-thc
0 replies
3h38m

Fair, but my point is that AWS has a full team of people that built and contributed to that magic box that is managing the database

You think so. The real answer is maybe maybe not. They could have all left and the actual maintainers now don't actually know the codebase. There's no way to know.

When things go wrong, you don't have a team of experts to look into what failed and why.

I've been on both sides of consulting / managed services teams and each time the "expert" was worse than the junior. Sure, there's some luck and randomness but it's not as clear cut as you make it.

and they have a lot of know-how on what went wrong, what the automation is doing, how to remediate issues, etc.

And to continue on the above I've also worked at SaaS/IaaS/PaaS where the person on call doesn't know much about the product (not always their fault) and so couldn't contribute much on incident.

There's just to much trust and good faith in this reply. I'm not advocating to manage everything yourself but yes, don't trust that the experts have everything either.

dzikimarian
0 replies
7h23m

If you don't want complexity of operators, you'll be probably OK with DB cluster outside of k8s. They're quite easy to setup, automate and there are straightforward tools to monitor them (eg. from Percona).

If you want to fully replicate AWS it may be more expensive than just paying AWS. But for most use cases it's simply not necessary.

Analemma_
0 replies
2h16m

Fair, but my point is that AWS has a full team of people that built and contributed to that magic box that is managing the database.

You sure about that? I used to work at AWS, and although I wasn't on K8S in particular, I can tell you from experience that AWS is a revolving door of developers who mostly quit the instant their two-year sign-on bonus is paid out, because working there sucks ass. The ludicrous churn means there actually isn't very much buildup of institutional knowledge.

filleokus
3 replies
9h16m

As with everything it's not black or white, but rather a spectrum. Sure, updating k8s is not that bad, but operating a distributed storage solution is no joke. Or really anything that requires persistence and clustering (like elastic).

You can also trade operational complexity for cash via support contracts and/or enterprise solutions (like just throwing money at Hitachi for storage rather than trying to keep Ceph alive).

p_l
2 replies
9h6m

If you don't need something crazy you can just grab what a lot of enterprises already had done for years, which is drop a few big storage servers and call it a day, connecting over iSCSI/NFS/whatever

filleokus
1 replies
7h20m

If you are in Kubernetes land you probably want object storage and some kind of PVC provider. Not thaaat different from an old fashioned iSCSI/NFS setup to be honest, but in my experience different enough to cause friction in an enterprise setting. You really don't want a ticket-driven, manual, provisioning process of shares

p_l
0 replies
6h30m

a PVC provider is nice, sure, but depending on how much you need/want simplest cases can be "mount a subdirectory from common exported volume", and for many applications ticket-based provisioning will be enough.

That said on my todo-list is some tooling to make simple cases with linux NFS or SMI-capable servers work as PVC providers.

tinco
1 replies
9h11m

Sure, but it requires that your engineers are vertically capable. In my experience, about 1 in 5 developers has the required experience and does not flat out refuse to have vertical responsibility over their software stack.

And that number might be high, in larger more established companies there might be more engineers who want to stick to their comfort bubble. So many developers reject the idea of writing SQL themselves instead of having the ORM do it, let alone know how to configure replication and failover.

I'd maybe hire for the people who could and would, but the people advocating for just having the cloud take care of these things have a point. You might miss out on an excellent application engineer, if you reject them for not having any Linux skills.

dzikimarian
0 replies
7h28m

Our devs are responsible for their docker image and the app. Then other team manages platform. You need some level of cooperation of course, but none of the devs cares too much about k8s internals or how the storage works.

dbackeus
10 replies
8h47m

Original creator and maintainer of Reclaim the Stack here.

you also removed monitoring of the platform

No we did not: Monitoring: https://reclaim-the-stack.com/docs/platform-components/monit...

Log aggregation: https://reclaim-the-stack.com/docs/platform-components/log-a...

Observability is on the whole better than what we had at Heroku since we now have direct access to realtime resource consumption of all infrastructure parts. We also have infinite log retention which would have been prohibitively expensive using Heroku logging addons (though we cap retention at 12 months for GDPR reasons).

Who/What is going to be doing that on this new platform and how much does that cost?

Me and my colleague who created the tool together manage infrastructure / OS upgrades and look into issues etc. So far we've been in production 1.5 years on this platform. On average we spent perhaps 3 days per month doing platform related work (mostly software upgrades). The rest we spend on full stack application development.

The hypothesis for migrating to Kubernetes was that the available database operators would be robust enough to automate all common high availability / backup / disaster recovery issues. This has proven to be true, apart from the Redis operator which has been our only pain point from a software point of view so far. We are currently rolling out a replacement approach using our own Kubernetes templates instead of relying on an operator at all for Redis.

Now you need to maintain k8s, postgresql, elasticsearch, redis, secret managements, OSs, storage... These are complex systems that require people understanding how they internally work

Thanks to Talos Linux (https://www.talos.dev/), maintaining K8s has been a non issue.

Running databases via operators has been a non issue, apart from Redis.

Secret management via sealed secrets + CLI tooling has been a non issue (https://reclaim-the-stack.com/docs/platform-components/secre...)

OS management with Talos Linux has been a learning curve but not too bad. We built talos-manager to manage bootstrapping new nodes to our cluster straight forward (https://reclaim-the-stack.com/docs/talos-manager/introductio...). The only remaining OS related maintenance is OS upgrades, which requires rebooting servers, but that's about it.

For storage we chose to go with simple local storage instead of complicated network based storage (https://reclaim-the-stack.com/docs/platform-components/persi...). Our servers come with datacenter grade NVMe drives. All our databases are replicated across multiple servers so we can gracefully deal with failures, should they occur.

Who is going to upgrade kubernetes when they release a new version that has breaking changes?

Ugrading kubernetes in general can be done with 0 downtime and is handled by a single talosctl CLI command. Breaking changes in K8s implies changes to existing resource manifest schemas and are detected by tooling before upgrades occur. Given how stable Kubernetes resource schemas are and how averse the community is to push breaking changes I don't expect this to cause major issues going forward. But of course software upgrades will always require due diligence and can sometimes be time consuming, K8s is no exception.

What happens when ElasticSearch decides to splitbrain and your search stops working?

ElasticSearch, since major version 7, should not enter split brain if correctly deployed across 3 or more nodes. That said, in case of a complete disaster we could either rebuild our index from source of truth (Postgres) or do disaster recovery from off site backups.

It's not like using ElasticCloud protects against these things in any meaningfully different way. However, the feedback loop of contacting support would be slower.

When the DB goes down or you need to set up replication?

Operators handle failovers. If we would lose all replicas in a major disaster event we would have to recover from off site backups. Same rules would apply for managed databases.

What is monitoring replication lag?

For Postgres, which is our only critical data source. Replication lag monitoring + alerting is built into the operator.

It should be straight forward to add this for Redis and ElasticSearch as well.

Or even simply things like disks being close to full?

Disk space monitoring and alerting is built into our monitoring stack.

At the end of the day I can only describe to you the facts of our experience. We have reduced costs to cover hiring about 4 full time DevOps people so far. But we have hired 0 new engineers and are managing fine with just a few days of additional platform maintenance per month.

That said, we're not trying to make the point that EVERYONE should Reclaim the Stack. We documented our thoughts about it here: https://reclaim-the-stack.com/docs/kubernetes-platform/intro...

swat535
5 replies
4h29m

Assuming average salary of 140k/year, you are dedicating 2 resources 3 times a month and this is already costing you ~38k/year on salaries alone and that's assuming your engineers have somehow mastered_both_ devops and software (very unlikely) and that they won't screw anything up. I'm not even counting the time it took you to migrate away..

This also assumes your infra doesn't grow and requires more maintenance or you have to deal with other issues.

Focusing on building features and generating revenue is much valuable than wasting precious engineering time maintain stacks.

This is hardly a "win" in my book.

rglullis
2 replies
4h22m

Right, because your outsourced cloud provider takes absolutely zero time of any application developers. Any issue with AWS and GCP is just one magic support ticket away and their costs already includes top priority support.

Right? Right?!

dangus
1 replies
4h3m

Heroku isn’t really analogous to AWS and GCP. Heroku actually is zero effort for the developers.

JasonSage
0 replies
3h36m

Heroku actually is zero effort for the developers.

This is just blatantly untrue.

I was an application developer at a place using Heroku for over four years, and I guarantee you we exceeded the aforementioned 2-devs-3-days-per-month in man hours in my time there due to Heroku:

- Matching up local env to Heroku images, and figuring out what it actually meant when we had to move off deprecated versions

- Peering at Heroku charts because lack of real machine observability, and eventually using Node to capture OS metrics and push them into our existing ELK stack because there was just no alternative

- Fighting PR apps to get the right set of env vars to test particular features, and maintaining a set of query-string overrides because there was no way to automate it into the PR deploy

I'm probably forgetting more things, but the idea that Heroku is zero effort for developers is laughable to me. I hate docker personally but it's still way less work than Heroku was to maintain, even if you go all the way down the rabbit hole of optimizing away build times et.

dbackeus
1 replies
3h22m

Assuming average salary of 140k/year

Is that what developers at your company cost?

Just curious. In Sweden the average devops salary is around 60k.

you are dedicating 2 resources 3 times a month and this is already costing you ~38k/year on salaries

Ok. So we're currently saving more than 400k/year on our migration. That would be worth 38k/year in salaries to us. But note that our actual salary costs are significantly lower.

that's assuming your engineers have somehow mastered_both_ devops and software (very unlikely)

Both me and my colleague are proficient at operations as well as programming. I personally believe the skillsets are complimentary and that web developers need to get into operations / scaling to fully understand their craft. But I've deployed web sites since the 90s. Maybe I'm a of a different breed.

We achieved 4 nines of up time in our first year on this platform which is more than we ever achieved using Heroku + other managed cloud services. We won't reach 4 nines in our second year due to a network failure on Hetzner, but so far we have not had downtime due to software issues.

This also assumes your infra doesn't grow and requires more maintenance

In general the more our infra grows the more we save (and we're still in the process of cutting additional costs as we slowly migrate more stuff over). Since our stack is automated we don't see any significant overhead in maintenance time for adding additional servers.

Potentially some crazy new software could come along that would turn out to be hard to deploy. But if it would be cheaper to use a managed option for that crazy software we could still just use a managed service. It's not like we're making it impossible to use external services by self-hosting.

Note that I wouldn't recommend Reclaim the Stack to early stage startups with minor hosting requirements. As mentioned on our site I think it becomes interesting around $5,000/month in spending (but this will of course vary on a number of factors).

Focusing on building features and generating revenue is much valuable than wasting precious engineering time maintain stacks.

That's a fair take. But the trade-offs will look different for every company.

What was amazing for us was that the developer experience of our platform ended up being significantly better than Heroku's. So we are now shipping faster. Reducing costs by an order of magnitude also allowed us to take on data intensive additions to our product which we would have never considered in the previous deployment paradigm since costs would have been prohibitively high.

koffiezet
0 replies
3h8m

Just curious. In Sweden the average devops salary is around 60k.

Well there's salary, and total employee cost. Now sure how it works in Sweden, but here in Belgium it's a good rule of thumb that an employer pays +- 2,5 times what an employee nets at the end after taxes etc. So say you get a net wage of €3300/month or about €40k/year ends up costing the employer about €100k.

I'm a freelance devops/sre/platform engineer, and all I can tell you is that even for long-term projects, my yearly invoice is considerably higher than that.

troupo
1 replies
7h1m

Since you're the original creator, can you open the site of your product, and find the link to your project that you open sourced?

- Front page links to docs and disord.

- First page of docs only has a link to discord.

- Installation references a "get started" repo that is... somehow also the main repo, not just "get started"?

dbackeus
0 replies
3h56m

The get-started repo is the starting point for installing the platform. Since the platform is gitops based, you'll fork this repo as described in: https://reclaim-the-stack.com/docs/kubernetes-platform/insta...

If this is confusing, maybe it would make sense to rename the repo to "platform" or something.

The other main component is k (https://github.com/reclaim-the-stack/k), the CLI for interacting with the platform.

We have also open sourced a tool for deploying Talos Linux on Hetzner called talos-manager: https://github.com/reclaim-the-stack/talos-manager (but you can use any Kubernetes, managed or self-hosted, so this is use-case specific)

ozgune
1 replies
6h36m

Hey there, this is a comprehensive and informative reply!

I had two questions just to learn more.

* What has been your experience with using local NVMes with K8s? It feels like K8s has some assumptions around volume persistence, so I'm curious if these impacted you at all in production.

* How does 'Reclaim the Stack' compare to Kamal? Was migrating off of Heroku your primary motivation for building 'Reclaim the Stack'?

Again, asking just to understand. For context, I'm one of the founders at Ubicloud. We're looking to build a managed K8s service next and evaluating trade-offs related to storage, networking, and IAM. We're also looking at Kamal as a way to deploy web apps. This post is super interesting, so wanted to learn more.

dbackeus
0 replies
3h13m

K8s works with both local storage and networked storage. But the two are vastly different from an operations point of view.

With networked storage you get fully decoupled compute / storage which allows Kubernetes to reschedule pods arbitrarily across nodes. But the trade off is you have to run additional storage software, end up with more architectural complexity and get performance bottlenecked by your network.

Please check out our storage documentation for more details: https://reclaim-the-stack.com/docs/platform-components/persi...

How does 'Reclaim the Stack' compare to Kamal?

Kamal doesn't really do much at all compared to RtS. RtS is more or less a feature complete Heroku alternative. It comes with monitoring / log aggregation / alerting etc. also automates High Availability deployments of common databases.

Keep in mind 37 signals has a dedicated devops team with 10+ engineers. We have 0 full time devops people. We would not be able to run our product using Kamal.

That said I think Kamal is a fine fit for eg. running a Rails app using SQLite on a single server.

Was migrating off of Heroku your primary motivation for building 'Reclaim the Stack'?

Yes.

Feel free to join the Discord and start a conversation if you want to bounce ideas for your k8s service :)

Nextgrid
9 replies
9h8m

This is FUD unless you're running a stock exchange or payment processor where every minute of downtime will cost you hundreds of thousands. For most businesses this is fear-mongering to keep the DevOps & cloud industry going and ensure continued careers in this field.

gspencley
2 replies
5h0m

It's not FUD, it's pointing out a very real fact that most problems are not engineering problems that you can fix by choosing the one "magical" engineering solution that will work for all (or even most) situations.

You need to understand your business and your requirements. Us engineers love to think that we can solve everything with the right tools or right engineering solutions. That's not true. There is no "perfect framework." No one sized fits all solution that will magically solve everything. What "stack" you choose, what programming language, which frameworks, which hosting providers ... these are all as much business decisions as they are engineering decisions.

Good engineering isn't just about finding the simplest or cheapest solution. It is about understanding the business requirements and finding the right solution for the business.

pphysch
1 replies
3h11m

Having managers (business people) make technical decisions based on marketing copy is how you get 10 technical problems that metastasize into 100 business problems, usually with little awareness of how we got there in the first place.

gspencley
0 replies
1h0m

Nice straw-man. I never once suggested that business people should be making technical decisions. What I said was that engineering solutions need to serve the needs of the business. Those are insanely different statements. They are so different that I think that you actively tried to misinterpret my comment so that you could shoot down something I didn't say.

almost
2 replies
8h16m

I run a business that is a long long way from a stock exchange or a payment processor. And while a few minutes of downtime is fine 30 minutes or a few hours at the wrong time will really make my customers quite sad. I've been woken in the small hours with technical problems maybe a couple of times over the last 8 years of running it and am quite willing to pay more for my hosting to avoid that happening again.

Not for Heroku, they're absolute garbage these days, but definitely for a better run PaaS.

Plenty of situations where running it yourself makes sense of course. If you have the people and the skills available (and the cost tradeoffs make sense) or if downtime really doesn't matter much at all to you then go ahead and consider things like this (or possibly simpler self hosting options, it depdns).But no, "you gotta run kubernettes yourself unless you're a stock exchange" is not a sensible position.

Maxion
1 replies
7h38m

I don't know why people don't value their time at all. PaaS are so cheap these days for the majority of projects, that it just is not worth it to spend your own time to manage the whole infrastructure stack.

If you're forced by regulation or if you just want to do it to learn, than yeah. But if your business is not running infra, or if your infra demands aren't crazy, then PaaS and what-have-you-flavored-cloud-container products will cost you ~1-2 work weeks of a single developer annually.

sgarland
0 replies
6h45m

Unless you already know how to run infra quickly and efficiently. Which – spoiler – you can achieve if you want to learn.

The_Colonel
1 replies
8h50m

It's not just about downtime, but also about not getting your systems hacked, not losing your data if sh1t hits the fan, regulation compliance, flexibility (e.g. ability to quickly spin-out new test envs) etc.

My preferred solution to this problem is different, though. For most businesses, apps, a monolith (maybe with a few extra services) + 1 relational DB is all you need. In such a simple setup, many of the problems faced either disappear or get much smaller.

packetlost
0 replies
4h57m

also about not getting your systems hacked...

The only systems I have ever seen get compromised firsthand were in public clouds and because they were in public clouds. Most of my career has been at shops that, for one reason or another, primarily own their own infrastructure, cloud represents a rather small fraction. It's far easier to secure a few servers behind a firewall than figure out the Rube Goldberg Machine that is cloud configuration.

not losing your data if sh1t hits the fan

You can use off-site backup without using cloud systems, you know? Backblaze, AWS Glacier, etc. are all pretty reasonable solutions. Most of the time when I've seen the need to exercise the backup strategy it's because of some software fuckup, not something like a disk dying. Using a managed database isn't going to save you when the intern TRUNCATEs the prod database on accident (and if something like that happens, it means you fucked up elsewhere).

regulation compliance

Most shops would be way better suited to paying a payment processor like Stripe, or other equivalent vendors for similarly protected data. Defense is a whole can of worms, "government clouds" are a scam that make you more vulnerable to an unauthorized export than less.

flexibility (e.g. ability to quickly spin-out new test envs) etc.

You actually lose flexibility by buying into a particular cloud provider, not gain it. Some things become easier, but many things become harder. Also, IME the hard part of creating reasonable test envs is configuring your edge (ingress, logging infra) and data.

HolyLampshade
0 replies
8h46m

Speaking of the exchanges (at least the sanely operated ones), there’s a reason the stack is simplified compared to most of what is being described here.

When some component fails you absolutely do not want to spend time trying to figure out the underlying cause. Almost all the cases you hear in media of exchange outages are due to unnecessary complexity added to what is already a remarkably complex distributed (in most well designed cases) state machine.

You generally want things to be as simple and streamlined as possible so when something does pop (and it will) your mean time to resolution is inside of a minute.

mlinhares
4 replies
4h39m

Anything you don't know about managing these systems can be learned asking chatgpt :P

Whenever I see people doing something like this I remember I did the same when I was in 10 people startups and it required A LOT of work to keep all these things running (mostly because back then we didn't have all these cloud managed systems) and that time would have been better invested in the product instead of wasting time figuring out how these tools work.

I see value in this kind of work if you're at the scale of something like Dropbox and moving from S3 will greatly improve your bottom line and you have a team that knows exactly what they're doing and will be assigned the maintenance of this work. If this is being done merely from a cost cutting perspective and you don't have the people that understand these systems, its a recipe for disaster and once shit is on fire the people that would be assigned to "fix" the problem will quickly disappear because the "on call schedule is insane".

re-thc
1 replies
3h36m

and that time would have been better invested in the product instead of wasting time figuring out how these tools work

It really depends on what you're doing. Back then a lot of non-VC startups worked better and the savings possibly helped. It also helps grow the team and have less reliance on the vendor. It's long term value.

Is it really time wasted? People often go into resume building mode and do all kinds of wacky things regardless. Perhaps this just helps scratch that itch.

mlinhares
0 replies
3h6m

Definitely fine from a personal perspective and resume building, it's just not in the best interest of the business because as soon as the person doing resume building is finished they'll jump ship. I've definitely done this myself.

But i don't see this being good from a pure business perspective.

ljm
0 replies
1h45m

I bailed out of one company because even though the stack seemed conceptually simple in terms of infra (there wasn't a great deal to it), the engineering more than compensated for it. The end result was the same: non-stop crisis management, non-stop firefighting, no capacity to work on anything new, just fixing old.

All by design, really, because at that point you're not part of an engineering team you're a code monkey operating in service of growth metrics.

Diederich
0 replies
2h5m

... I remember I did the same when I was in 10 people startups and it required A LOT of work to keep all these things running...

Honest question: how long ago was that? I stepped away from that ecosystem four or so years ago. Perhaps ease of use has substantially improved?

almost
1 replies
8h32m

The fact that HN seems to think this is "FUD" is absolutely wild. You just talked about (some of) the tradeoffs involved in running all this stuff yourself. Obviously for some people it'll be worth and for others not, but absolutely amazing that there are people who don't even seem to accept that those tradoffs exist!

dzikimarian
0 replies
3h39m

I assume you reference my comment.

The reason I think parent comment is FUD isn't because I don't acknowledge tradeoffs (they are very real).

It's because parent comment implies that people behind "reclaim the stack" didn't account for the monitoring, people's cost etc.

Obviously any reasonable person making that decision includes it into calculation. Obviously nobody sane throws entire monitoring out of the window for savings.

Accounting for all of these it can be still viable and significantly cheaper to run own infra. Especially if you operate outside of the US and you're able to eat an initial investment.

ugh123
0 replies
2h26m

you also removed monitoring of the platform

You don't think they have any monitoring within Kubernetes?

I imagine they have more monitoring capabilities now than they did with Heroku.

matus_congrady
0 replies
9h50m

Since DHH has been promoting the 'do-it-yourself' approach, many people have fallen for it.

You're asking the right questions that only a few people know they need answers to.

In my opinion, the closest thing to "reclaiming the stack" while still being a PaaS is to use a "deploy to your cloud account" PaaS provider. These services offer the convenience of a PaaS provider, yet allow you to "eject" to using the cloud provider on your own should your use case evolve.

Example services include https://stacktape.com, https://flightcontrol.dev, and https://www.withcoherence.com.

I'm also working on a PaaS comparison site at https://paascout.io.

Disclosure: I am a founder of Stacktape.

johnnyanmac
0 replies
1h30m

Who/What is going to be doing that on this new platform and how much does that cost?

If you're already a web platform with hired talent (and someone using Heroku for a SaaS probably already is), I'd be surprised if the marginal cost was 10x.that paid support is of course coming at a premium, and isn't too flexible on what level of support you need.

And yeah, it isn't apples to apples. Maybe you are in a low CoL area and can find a decent DevOps for 80-100k. Maybe you're in SF and any extra dev will be 250k. It'll vary immensely on cost.

rglover
27 replies
16h35m

I made the mistake of falling for the k8s hype a few years back for running all of my indie hacker businesses.

Big mistake. Overnight, the cluster config files I used were no longer supported by the k8s version DigitalOcean auto upgraded my cluster to and _boom_. Every single business was offline.

Made the switch to some simple bash scripts for bootstrapping/monitoring/scaling and systemd for starting/restarting apps (nodejs). I'll never look back.

eddd-ddde
6 replies
16h33m

So either digital ocean auto updates breaking versions. Or k8s doesn't do versioning correctly. Both very bad.

Which was it?

poincaredisk
4 replies
16h22m

I assume the first one, but it's more complicated. K8s used to have a lot of features (included very important ones) in the "beta" namespace. There are no stability guarantees there, but everyone used them anyway. Over time they graduated to the "stable" namespace, and after some transitory period they were removed from the beta namespace. This broke old deployments, when admins ignored warnings for two or three major releases.

dmurray
2 replies
11h8m

It's an odd choice to break backwards compatibility by removing them from the beta namespace. Why not keep them available in both indefinitely?

pcthrowaway
0 replies
10h43m

Probably because the devs understandably can't account for every possible way people might be using it when shipping new features. But in my experience this means k8s is a bag of fiddly bits that requires some serious ops investments to be reliable for anything serious.

p_l
0 replies
9h2m

With one exception that was rather big change to some low-level stuff, the "remove beta tags" was done with about a year or more of runway for people to upgrade.

And ultimately, it wasn't hard to upgrade, even if you deal with auto-upgrading cluster and forgot about it, because "live" deployments got auto-upgraded - you do need to update your deployment script/whatever though.

psini
0 replies
9h34m

Just want to mention that two or three major releases sounds very bad, but Kubernetes had the insane release cadence of 4(!) major versions every year.

rglover
0 replies
16h12m

Technically both, but more so the former.

I had a heck of a time finding accurate docs on the correct apiVersion to use for things like my ingress and service files (they had a nasty habit of doing beta versions and changing config patterns w/ little backwards compatibility). This was a few years back when your options were a lot of Googling, SO, etc, so the info I found was mixed/spotty.

As a solo founder, I found what worked at the time and assumed (foolishly, in retrospect) that it would just continue to work as my needs were modest.

cedws
3 replies
9h50m

Weird how defensive people get about K8S when you say stuff like this. It’s like they’re desperately trying to convince you that you really do need all that complexity.

rollcat
1 replies
5h25m

I believe there's still a lot of potential for building niche / "human-scale" services/businesses, that don't inherently require the scalability of the cloud or complexity of k8s. Scaling vertically is always easier, modern server hardware has insane perf ceiling. The overall reduction in complexity is a breath of fresh air.

My occasional moral dilemma is idle power usage of overprovisioned resources, but we've found some interesting things to throw at idle hardware to ease our conscience about it.

sswezey
0 replies
3h34m

I particularly like this moniker for such human-scale, "digital gardening"-type software: https://hobbit.software/

0perator
0 replies
6h16m

Most do not, but they still want all the toys that developers are building for “the cloud”.

llama052
2 replies
15h32m

So you had auto update enabled on your cluster and didn’t keep your apiversions up to date?

Sounds like user error.

rvense
0 replies
12h11m

One of my main criteria for evaluating a platform would be how easy it is to make user errors.

psini
0 replies
9h38m

To be honest the API versions have been a lot more stable recently but back in ~2019 when I first used Kube in production, basic APIs were getting deprecated left and right, 4 times a year; in the end yes the problems are "on you" but it so easy to miss and the results so disastrous for a platform whose selling points are toughness resilience and self-healing

w0m
1 replies
14h9m

yeouch. sorry man. I've been running in AKS for 3-4 years now and never had an auto-upgrade come in I wasn't expecting. I have been ontop of alerts and security bulletins though, may have kept me ahead of the curve.

willvarfar
0 replies
11h48m

I was once on a nice family holiday and broke my resolve and did a 'quick' check of my email and found a nastygram billing reminder from a provider. On the one hand I was super-lucky I checked my mail when I did, and on the other I didn't get he holiday I needed and was lucky to not spill over and impact my family's happiness around me.

poincaredisk
1 replies
16h26m

I use k8s for the last uhh 5 years and this never happened to me. In my case, because I self-host my cluster, do no unexpected upgrades. But I agree that maintaining k8s cluster takes some work.

theptip
0 replies
4h5m

In the 2015-2019 period there were quite a few API improvements involving deprecating old APIs, it’s much more stable/boring now. (Eg TPR -> CRD was the big one for many cluster plugins)

nine_k
1 replies
13h38m

How does it compare to a simpler but not hand-crafted solution, such as dokku?

rglover
0 replies
1h41m

No Docker for starters. I played with Dokku a long time ago and remember it being decent at that time, but still too confusing for my skillset.

Now, I just build my app to an encrypted tarball, upload it to a secure bucket, and then create a short-lived signed URL for instances to curl the code from. From there, I just install deps on the machine and start up the app with systemd.

IMO, Docker is overkill for 99% of projects, perhaps all. One of those great ideas, poorly executed (and considering the complexity, I understand why).

mythz
1 replies
6h55m

We're also ignoring Kubernetes and are just using GitHub Actions, Docker Compose and SSH for our CI Deployments [1]. After a one-time setup on the Deployment Server, we can deploy new Apps with just a few GitHub Action Secrets, which then gets redeployed on every commit, including running any DB Migrations. We're currently using this to deploy and run over 50 .NET Apps across 3 Hetzner VMs.

[1] https://servicestack.net/posts/kubernetes_not_required

oldprogrammer2
0 replies
6h12m

The amount of complexity people are introducing into their infrastructure is insane. At the end of the day, we're still just building the same CRUD web apps we were building 20 years ago. We have 50x the computation power, much faster disk, much more RAM, and much faster internet.

A pair of load-balanced web servers and a managed database, with Cloudflare out front, will get you really, really far.

akvadrako
1 replies
13h14m

EKS has a tab in the dashboard that warns about all the deprecated configs in your cluster, making it pretty foolproof to avoid this by checking every couple years.

hhh
0 replies
9h42m

Yes, and there are many open source tools that you can point at clusters to do the same. We use Kubent (Kube No Troubles) to do the same.

tucnak
0 replies
8h57m

So what is the alternative? Nomad?

port19
0 replies
11h4m

simple bash scripts for bootstrapping/monitoring/scaling

Damn, that's the dream right there

minkles
0 replies
11h44m

The first live k8s cluster upgrade anyone has to do is usually when they think "what the fuck did I get myself in to?"

It's only good for very large scale stuff. And then a lot of the time that is usually well over provisioned and could be done considerably cheaper using almost any other methodology.

The only good part of Kubernetes I have found in the last 4 years of running it in production is that you can deploy any old limping crap to it and it does its best to keep it alive which means you can spend more time writing YAML and upgrading it every 2 minutes.

thetopher
25 replies
18h59m

“Our basic philosophy when it comes to security is that we can trust our developers and that we can trust the private network within the cluster.”

This is not my area of expertise. Does it add a significant amount of complexity to configure this kind of system in a way that doesn’t require trusting the network? Where are the pain points?

stouset
18 replies
18h33m

Our basic philosophy when it comes to security is that we can trust our developers and that we can trust the private network within the cluster.

As an infosec guy, I hate to say it but this is IMO very misguided. Insider attacks and external attacks are often indistinguishable because attackers are happy to steal developer credentials or infect their laptops with malware.

Same with trusting the private network. That’s fine and dandy until attackers are in your network, and now they have free rein because you assumed you could keep the bad people outside the walls protecting your soft, squishy insides.

jonstewart
13 replies
18h11m

One of the best things you can do is restrict your VPCs from accessing the internet willy-nilly outbound. When an attacker breaches you, this can keep them from downloading payloads and exfiltrating data.

jiggawatts
9 replies
17h43m

You’ve just broken a hundred things that developers and ops staff need daily to block a theoretical vulnerability that is irrelevant unless you’re already severely breached.

This kind of thinking is why secops often develops an adversarial relationship with other teams — the teams actually making money.

I’ve seen this dynamic play out dozens of times and I’ve never seen it block an attack. I have seen it tank productivity and break production systems many times however.

PS: The biggest impact denying outbound traffic has is to block Windows Update or the equivalent for other operating systems or applications. I’m working with a team right now that has to smuggle NPM modules in from their home PCs because they can’t run “npm audit fix” successfully on their isolated cloud PCs. Yes, for security they’re prevented from updating vulnerable packages unless they bend over backwards.

inhumantsar
2 replies
17h12m

there's no need for this to be an either/or decision.

private artifact repos with the ability to act as a caching proxy are easy to set up. afaik all the major cloud providers offer basic ones with the ability to use block or allow lists.

going up a level in terms of capabilities, JFrog is miserable to deal with as a vendor but Artifactory is hard to beat when it comes to artifact management.

junto
0 replies
13h22m

Their caching proxy sucks though. We had to turn it off because it persistently caused build issues due to its unreliability.

jiggawatts
0 replies
13h32m

Sure… for like one IDE or one language. Now try that for half a dozen languages, tools, environments, and repos. Make sure to make it all work for build pipelines, and not just the default ones either! You need a bunch of on-prem agents to work around the firewall constraints.

This alone can keep multiple FTEs busy permanently.

“Easy” is relative.

Maybe you work in a place with a thousand devs and infinite VC money protecting a trillion dollars of intellectual property then sure, it’s easy.

If you work in a normal enterprise it’s not easy at all.

jonstewart
1 replies
17h0m

You’ve just broken a hundred things that developers and ops staff need daily to block a theoretical vulnerability that is irrelevant unless you’re already severely breached.

I’m both a developer and a DFIR expert, and I practice what I preach. The apps I ship have a small allowlist for necessary external endpoints and everything else is denied.

Trust me, your vulnerabilities aren’t theoretical, especially if you’re using Windows systems for internet-facing prod.

bigfatkitten
0 replies
14h4m

This should still be fresh in the mind of anyone who was using log4j in 2021.

coryrc
1 replies
16h54m

I can't be certain, but I think the GP means production VMs not people's workstations. Or maybe I fail to understand the complexities you have seen, but I'm judging my statement especially on the "download from home" thing which seems only necessary if you packed full Internet access on your workstation.

jiggawatts
0 replies
13h26m

The entire network has a default deny rule outbound. Web traffic needs to go via authenticating proxies.

Most Linux-pedigree tools don’t support authenticating proxies at all, or do so very poorly. For example, most have just a single proxy setting that’s either “on” or “off”. Compare that to PAC files typically used in corporate environments that implement a fine grained policy selecting different proxies based on location or destination.

It’s very easy to get into a scenario where one tool requires a proxy env var that breaks another tool.

“Stop complaining about the hoops! Just jump through them already! We need you to do that forever and ever because we might get attacked one day by an attacker that’ll work around the outbound block in about five minutes!”

bigfatkitten
1 replies
17h34m

I’ve seen this dynamic play out dozens of times and I’ve never seen it block an attack.

I am a DFIR consultant, and I've been involved in 20 or 30 engagements over the last 15 years where proper egress controls would've stopped the adversary in their tracks.

jiggawatts
0 replies
12h49m

Any statement like that qualified with “proper” is a no true Scotsman fallacy.

What do you consider proper egress blocking? No DNS? No ICMP? No access to any web proxy? No CDP or OCSP access? Strict domain-based filtering of all outbound traffic? What about cloud management endpoints?

This can get to the point that it becomes nigh impossible to troubleshoot anything. Not even “ping” works!

And troubleshoot you will have to, trust me. You’ll discover that root cert updates are out-of-band and not included in some other security patches. And you’ll discover that the 60s delay that’s impossible to pin down is a CRL validating timeout. You’ll discover that ICMP isn’t as optional as you thought.

I’ve been that engineer, I’ve done this work, and I consider it a waste of time unless it is protecting at least a billion dollars worth of secrets.

PS: practically 100% of exfiltrated data goes via established and approved channels such as OneDrive. I just had a customer send a cloud VM disk backup via SharePoint to a third party operating in another country. Oh, not to mention the telco that has outsourced core IT functions to both Chinese and Russian companies. No worries though! They’ve blocked me from using ping to fix their broken network.

sanderjd
2 replies
18h2m

In the scenario presented, can't they just exfiltrate using the developer credentials / machine?

jonstewart
1 replies
16h53m

Let’s say there’s a log4j-type vuln and your app is affected. So an attacker can trigger an RCE in your app, which is running in, say, an EC2 instance in a VPC. A well-configured app server instance will have only necessary packages on it, and hopefully not much for dev tools. The instance will also run with certain privileges through IAM and then there won’t be creds on the instance for the attacker to steal.

Typically an RCE like this runs a small script that will download and run a more useful piece of malware, like a webshell. If the webshell doesn’t download, the attacker probably is moving onto the next victim.

sanderjd
0 replies
15h36m

But the original comment wasn't about this attack vector...

attackers are happy to steal developer credentials or infect their laptops with malware

I don't think any of what you said applies when an attacker has control of a developer machine that is allowed inside the network.

apitman
2 replies
15h44m

What's your opinion on EDR in general? I find it very distasteful from a privacy perspective, but obviously it could be beneficial at scale. I just wish there was a better middle ground.

yodelshady
0 replies
8h42m

Not the OP but I was on that side -

They do work. My best analogy is it's like working at TSA except there are three terrorist attacks per week.

As far as privacy goes, by the same analogy, I can guarantee the operators don't care what porn you watch. Doing the job is more important. But still, treat your work machine as a work machine. It's not yours, it's a tool your company lent to you to work with.

That said, on HN your workers are likely to be developers - that does take some more skill, and I'd advise asking a potential provider frank questions about their experience with the sector, as well as your risk tolerance. Devs do dodgy stuff all the time, and they usually know what they're doing, but when they don't you're going to have real fun proving you've remediated.

tryauuum
0 replies
5h44m

EDR is not related to the topic but now I'm curious as well. Any good EDR for ubuntu server?

bigfatkitten
0 replies
17h20m

It's a mindset that keeps people like you and I employed in well-paying jobs.

zymhan
0 replies
18h55m

It requires encrypting all network traffic, either with something like TLS, or IPSec VPN.

umvi
0 replies
18h51m

Implementing "Zero Trust" architectures are definitely more onerous to deal with for everyone involved (both devs and customers, if on prem). Just Google "zero trust architecture" to find examples. A lot more work (and therefore $) to setup and maintain, but also better security since now breaching network perimeter is no longer enough to pwn everything inside said network.

nilsherzig
0 replies
18h47m

"SSL added and removed here :^)"

jandrewrogers
0 replies
14h45m

This is just bad security practice. You cannot trust the internal network, so many companies have been abused following this principle. You have to allow for the possibility that your neighbors are hostile.

callalex
0 replies
18h45m

The top pain point is that it requires setting up SSL certificate infrastructure and having to store and distribute those certs around in a secure way.

The secondary effects are entirely dependent on how your microservices talk to their dependencies. Are they already talking to some local proxy that handles load balancing and service discovery? If so, then you can bolt on ssl termination at that layer. If not, and your microservice is using dns and making http requests directly to other services, it’s a game of whack-a-mole modifying all of your software to talk to a local “sidecar”; or you have to configure every service to start doing the SSL validation which can explode in complexity when you end up dealing with a bunch of different languages and libraries.

None of it is impossible by any means, and many companies/stacks do all of this successfully, but it’s all work that doesn’t add features, can lead to performance degradation, and is a hard sell to get funding/time for because your boss’s boss almost certainly trusts the cloud provider to handle such things at their network layer unless they have very specific security requirements and knowledge.

agf
0 replies
18h27m

Yes, it adds an additional level of complexity to do role-based access control within k8s.

In my experience, that access control is necessary for several reasons (mistakes due to inexperience, cowboys, compliance requirements, client security questions, etc.) around 50-100 developers.

This isn't just "not zero trust", it's access to everything inside the cluster (and maybe the cluster components themselves) or access to nothing -- there is no way to grant partial access to what's running in the cluster.

ksajadi
21 replies
15h36m

I’ve been building and deploying thousands of stacks on first Docker, then Mesos, then Swarm and now k8s. If I have learned one thing from it, it’s this: it’s all about the second day.

There are so many tools that make it easy to build and deploy apps to your servers (with or without containers) and all of them showcase how easy it is to go from a cloud account to a fully deploy app.

While their claims are true, what they don’t talk about is how to maintain the stack, after “reclaiming” it. Version changes, breaking changes, dependency changes and missing dependencies, disaster recovery plans, backups and restores, major shifts in requirements all add up to a large portion of your time.

If you have that kind of team, budget or problem that deserves those, then more power to you.

wg0
7 replies
12h34m

This is absolutely true. I can count easily some 20+ components already.

So this is not walk in the park with two willing developers to learn k8s.

The underlying apps (Redis, ES) will have version upgrades.

Their respective operators themselves would have version upgrades.

Essential networking fabric (calico, funnel and such) would have upgrades.

The underlying kubernetes itself would have version upgrades.

The Talos Linux itself might need upgrades.

Of all the above, any single upgrade might lead to infamous controller crash loop where pod starts and dies with little to no indication as to why? And that too no ordinary pod but a crucial pod part of some operator supposed to do the housekeeping for you.

k8s is invented at Google and is more suitable in ZIRP world where money is cheap and to change the logo, you have seven designers on payroll discussing for eight months how nine different tones of brand coloring might convey ten different subliminal messages.

imiric
4 replies
11h57m

The underlying apps (Redis, ES) will have version upgrades.

You would have to deal with those with or without k8s. I would argue that without it is much more painful.

Their respective operators themselves would have version upgrades. > > Essential networking fabric (calico, funnel and such) would have upgrades. > > The underlying kubernetes itself would have version upgrades. > > The Talos Linux itself might need upgrades.

How is this different from regular system upgrades you would have to do without k8s?

K8s does add layers on top that you also have to manage, but it solves a bunch of problems in return that you would have to solve by yourself one way or another.

That essential networking fabric gives you a service mesh for free, that allows you to easily deploy, scale, load balance and manage traffic across your entire infrastructure. Building that yourself would take many person-hours and large teams to maintain, whereas k8s allows you to run this with a fraction of the effort and much smaller teams in comparison.

Oh, you don't need any of that? Great. But I would wager you'll find that the hodge podge solution you build and have to maintain years from now will take much more of your time and effort than if you had chosen an industry standard. By that point just switching would be a monumental effort.

Of all the above, any single upgrade might lead to infamous controller crash loop where pod starts and dies with little to no indication as to why?

Failures and bugs are inevitable. Have you ever had to deal with a Linux kernel bug?

The modern stack is complex enough as it is, and while I'm not vouching for increasing it, if those additional components solve major problems for me, and they become an industry standard, then it would be foolish to go against the grain and reinvent each component once I have a need for it.

mplewis
3 replies
11h37m

You seem to be misunderstanding. The components that add complexity in this case do not come from running a k8s cluster. They come from the Reclaim the Stack software.

imiric
2 replies
10h14m

Alright. So let's discuss how much time and effort it would take to build and maintain a Heroku replacement without k8s then.

Besides, GP's criticisms were squarely directed at k8s. For any non-trivial workloads, you will likely use operators and networking plugins. Any of these can have bugs, and will add complexity to the system. My point is that if you find any of those features valuable, then the overall cost would be much less than the alternatives.

Maxion
1 replies
7h31m

The alternative is not to build a different PaaS alternative, but to simply pay Heroku/AWS/Google/Indie PaaS providers and go back to making your core product.

imiric
0 replies
5h58m

Did you read the reasons they moved away from Heroku to begin with? Clearly what you mention wasn't an option for them, and they consider this project a success.

sgarland
1 replies
6h42m

Talos is an immutable OS; upgrades are painless and roll themselves back upon failure. Same thing for K8s under Talos (the only thing Talos does is run K8s).

specialist
0 replies
4h36m

TIL "immutable OS", thanks.

Ages ago, I had the notion of booting from removable read-only media. At the time CD-ROM. Like gear for casting and tabulating votes. Or controllers for critical infra.

(Of course, a device's bootloader would have to be ROM too. And boot images would be signed, both digitally and manually.)

Maybe "immutable boot" and immutable OS can be complimentary. Surely someone's already explored this (obv idea). Worth pondering.

bsenftner
3 replies
8h17m

The thing that strikes me is: okay, two "willing developers" - but they need to be actually capable, not just "willing" but "experienced and able" and that lands you at a minimum of $100k per year per engineer. That means this system has a maintenance cost of over $16K per month, if you have to dedicate two engineers full to the maintenance, and of course following the dynamic nature of K8s and all their tooling just to stay in front of all of that.

oldprogrammer2
0 replies
6h21m

Even worse, this feels like the goal was actually about reclaiming their resumes, not the stack. I expect these two guys to jump ship within a year, leaving the rest of the team trying to take care of an entire ecosystem they didn't build.

Maxion
0 replies
7h12m

And you may still end up with longer downtime if SHTF than if you use a managed provider.

0perator
0 replies
6h18m

Also, for only two k8s devops engineers in a 24h-available world, you’re gonna be running them ragged with 12h solo shifts or taking the risk of not staffing overnight. Considering most update and backup jobs kick off at midnight, that’s a huge risk.

If I were putting together a minimum-viable staffing for a 24x7 available cluster with SLAs on RPO and RTO, I’d be recommending much more than two engineers. I’d probably be recommending closer to five: one senior engineer and one junior for the 8-4 shift, a engineer for the 4-12 shift, another engineer for the 12-8 shift, and another junior who straddles the evening and night shifts. For major outages, this still requires on-call time from all of the engineers, and additional staffing may be necessary to offset overtime hours. Given your metric of roughly $8k an engineer, we’d be looking at a cool $40K/month in labour just to approach four or five 9s of availability.

tomwojcik
2 replies
12h33m

Agreed. Forgive a minor digression, but what OP wrote is my problem now. I'm looking for something like heroku's or fly's release command. I have an idea how to implement it in docker using swarm, but I can't figure out how to do that on k8s. I googled it some time ago, but all the answers were hacks.

Would someone be able to recommend an approach that's not a hack, for implementing a custom release command on k8s? Downtime is fine, but this one off job needs to run before the user facing pods are available.

psini
1 replies
9h42m

Look at helm charts, they have become the de facto standard for packaging/distributing/deploying/updating whole apps on Kubernetes

trashburger
0 replies
6h39m

https://leebriggs.co.uk/blog/2019/02/07/why-are-we-templatin...

Something like Jsonnet would serve one better, I think. The only part that kinda sucks is the "package management" but that's a small price to pay to avoid the YAML insanity. Helm is fine for consuming third-party packages.

benjaminwootton
1 replies
10h52m

The flip side of this is the cost. Managed cloud services make it faster to get live, but then you are left paying managed service providers for years.

I’ve always been a big cloud/managed service guy, but the costs are getting astronomical and I agree the buy vs build of the stack needs a re-evaluation.

Maxion
0 replies
7h29m

This is the balance, right? For the vast majority of web apps et. al. the cloud costs are going to be cheaper than having full-time Ops people managing an OSS stack on VPS / Bare Metal.

szundi
0 replies
12h45m

And what is your take on all those things that you tried? Some experience/examples would benefit us probably.

sedatk
0 replies
12h12m

it’s all about the second day

Tangentially, I think this applies to LLMs too.

imiric
0 replies
12h46m

Agreed, but to be fair, those are general problems you would face with any architecture. At least with mainstream stacks you get the benefit of community support, and relying on approaches that someone else has figured out. Container-based stacks also have the benefit of homogeneizing your infrastructure, and giving you a common set of APIs and workflows to interact with.

K8s et al are not a silver bullet, but at this point they're highly stable and understood pieces of infrastructure. It's much more painful to deviate from this and build things from scratch, deluding yourself that your approach can be simpler. For trivial and experimental workloads that may be the case, but for anything that requires a bit more sophistication these tools end up saving you resources in the long run.

AnAnonyCowherd
0 replies
4h33m

If you have that kind of team, budget or problem that deserves those, then more power to you.

This is the operative issue, and it drives me crazy. Companies that can afford to deploy thousands of services in the cloud definitely have the resources to develop in-house talent for hosting all of that on-prem, and saving millions per year. However, middle management in the Fortune 500 has been indoctrinated by the religion that you take your advice from consultants and push everything to third parties so that 1) you build your "kingdom" with terribly wasteful budget, and 2) you can never be blamed if something goes wrong.

As a perfect example, in my Fortune 250, we have created a whole new department to figure out what we can do with AI. Rather than spend any effort to develop in-house expertise with a new technology that MANY of us recognize could revolutionize our engineering workflow... we're buying Palatir's GenAI product, and using it to... optimize plant safety. Whatever you know about AI, it's fundamentally based on statistics, and I simply can't imagine a worse application than trying to find patterns in data that BY DEFINITION is all outliers. I literally can't even.

You smack your forehead, and wonder why the people at the top, making millions in TC, can't understand such basic things, but after years of seeing these kinds of short-sighted, wasteful, foolish decisions, you begin to understand that improving the company's abilities, and making it competitive for the future is not the point. What is the point "is an exercise left to the reader."

sph
20 replies
19h20m

"Join the Discord server"? Who's the audience of this project?

mre
13 replies
19h14m

Genuinely curious, what's wrong with that? Did you expect a different platform like Slack?

callalex
10 replies
18h38m

Locking knowledge behind something that isn’t publicly searchable or archivable works fine in the short term but what happens when Discord/Slack/whatever gears up for an IPO and limits all chat history to 1 week unless you pay up (oh and now you have a bunch of valuable knowledge stored up their with no migration tool so your only options are “pay up” or lose the knowledge).

halfcat
4 replies
17h1m

What’s recommended here? Self-hosted Discourse?

heavyset_go
2 replies
15h58m

Matrix and a wiki would solve the community and knowledge base issues.

mplewis
1 replies
11h31m

Matrix has severe UX issues which drastically limit the community willing to use it on a regular basis.

Arathorn
0 replies
3h11m

historically, yes. matrix 2.0 (due in two weeks) aims to fix this.

tacker2000
0 replies
10h25m

Github issues or discussions. Or some other kind of forum like Discourse as you mentioned

Kiro
4 replies
13h23m

No-one complained when projects had IRC channels/servers, which are even worse since they have no history at all.

okasaki
1 replies
8h40m

All IRC clients have local plain text logging and putting a .txt on a web server is trivial.

Kiro
0 replies
5h48m

Local logging doesn't help much for searchability when you're new and it requires you to be online 24/7. Anyway, that's beside the point. Even if IRC had built-in server history it still has the same problems but I never saw people being outraged about it.

Gormo
1 replies
3h54m

Good projects still do rely on IRC -- Libera.chat is full of proper channels -- and logging bots are ubiquitous.

Kiro
0 replies
1h48m

And you never hear anyone complaining about those. "Locking knowledge" was never an argument before and it's not now.

fragmede
0 replies
19h6m

it would be better at the bottom of the first documentation page, after the reader has a better idea of what this is

Gormo
0 replies
3h55m

There's a whole FOSS ecosystem of chat/collaboration applications, like Mattermost and Zulip; there's Matrix for a federated solution, and tried-and-true options like IRC.

For something called "Reclaim the Stack" to lock discussion into someone else's proprietary walled garden is quite ironic.

tacker2000
2 replies
12h41m

Also noticed this. Everytime I see a project using discord as main communication tool it makes me think about the “fitness” of the project in the long run.

Discord is NOT a benefit. Its not publicly searchable and the chat format is just not suitable to a knowledge base or support based format.

Forums are much better in that regard.

KronisLV
1 replies
5h38m

Discord is NOT a benefit. Its not publicly searchable and the chat format is just not suitable to a knowledge base or support based format.

I don't think people who choose Discord necessarily care about that. Discord is where the people are, so that's where they go. It also costs close to nothing to setup a server and since it has a lower barrier of entry than hosting your own forum, it's deemed good enough.

That said, modern forum software like Discourse https://www.discourse.org/ or Flarum https://flarum.org/ can be pretty good, though I still miss phpBB.

Gormo
0 replies
3h51m

Discord is where the people are, so that's where they go.

That doesn't sound right. Each Discord community is its own separate space -- you still need people to join your specific community regardless of whether it is hosted on Discord or something better.

though I still miss phpBB.

It hasn't gone away -- the last release was on August 29th, so this is still very much a viable option.

mrits
2 replies
19h9m

People that don't like wasting money?

hobs
0 replies
17h57m

Not capturing the information and being able to use it in the future is a huge opportunity cost, and idling on discord pays no bills.

Gormo
0 replies
3h50m

Wasting money on... better solutions that are also free?

andrewstuart
15 replies
19h16m

How can a NewsDesk application need kubernetes?

Wouldn't a single machine and a backup machine do the job?

andrewstuart
5 replies
19h8m

I just looked it up - its because they run Ruby On Rails.

zymhan
4 replies
18h54m

and so what?

andrewstuart
3 replies
18h53m

Ruby On Rails is well known for not being at the fast end of the spectrum, so it needs lots of machines, and lots of machines gives reason to user Kubernetes.

A NewsDesk application written in something compiled for example golang would be much faster and likely could run on a single server.

The benefit of single server being you don't need kubernetes and can spend that development resource on developing application features.

jakelazaroff
2 replies
18h39m

I personally prefer Go to Rails, but let’s be real here: the market cap of Rails is probably, like, a hundred times the market cap of Go.

sien
0 replies
17h49m

No doubt with Github, AirBnB, Shopify and other big sites RoR is bigger for the front end.

But now if lots of those sites are running on K8s with Argo CD or something or on a cloud platform where the infrastructure is provisioned with Terraform Go is supporting a great deal of things but it's far less visible.

maxk42
0 replies
17h38m

I believe Crystal is worth a look here.

bluepizza
4 replies
19h4m

Most simple applications that use k8s are doing it for autoscaling or no downtime continuous deployment (or both).

wordofx
3 replies
18h16m

So basically 2 things you don’t need k8s to solve?

mplewis
1 replies
11h30m

What would you use to solve these problems?

mrweasel
0 replies
11h7m

VMs and load balancers?

From the documentation on the site it says that they're running on dedicated servers from Hetzner... So they aren't auto-scaling anything, they are paying for that hardware 24/7. It makes absolutely no difference what the number of running containers are, the cost remains constant.

bluepizza
0 replies
10h55m

You don't need anything. You choose the most convenient tool according to your professional judgment. I certainly hope that nobody is using Kubernetes because they are against the wall, and instead decide to use it for its features.

briandear
3 replies
19h0m

Is business running a complete application stack on a single machine?

notpushkin
0 replies
17h58m

A lot of businesses don’t need more than a couple machines (and can get away with one, but it’s not good for redundancy).

mrweasel
0 replies
11h5m

Frequently yes, normally I'd say that the database server is on a separate machine, but otherwise yes.

I've seen companies run a MiniKube installation on a single server and run their applications that way.

liveoneggs
0 replies
14h56m

my vps reboots every 18 months or so..

fourseventy
10 replies
18h37m

In my experience you can get pretty far with just a handful of vms and some bash scripts. At least double digit million ARR. Less is more when it comes to devops tooling imo.

lolinder
5 replies
17h55m

you can get pretty far with just a handful of vms and some bash scripts. At least double digit million ARR.

Using ARR as the measurement for how far you can scale devops practices is weird to me. Double-digit million ARR might be a few hundred accounts if you're doing B2B, and double-digit million MAUs if you're doing an ad-funded social platform. Depending on how much software is involved your product could be built by a team of anywhere from 1-50 developers.

If you're a one-developer B2B company handling 1-3 requests per second you wouldn't even need more than one VM except maybe as redundancy. But if you're the fifty-developer company that's building something beyond simple CRUD, there are a lot of perks that come with a full-fledged control plane that would almost certainly be worth the added cost and complexity.

marcosdumay
1 replies
17h2m

double-digit million MAUs

I was about to make a similar point, but you made the math, and it's holding-up for the GP's side.

You can push vms and direct to ssh synchronization up to double-digit million MAU (unless you are using stuff like persistent web-sockets). It won't be pretty, but you can get that far.

lolinder
0 replies
16h35m

I'm not concerned about handling the requests for the main user-facing application (as you say, you can get way further with a single box than many people think), I'm thinking about all of the additional complexity that comes with serving multiple millions of human users that wouldn't exist if you were just serving a few hundred web scrapers that happen to produce as much traffic as multiple millions of humans.

What those sources of complexity are depends a lot on the product, but some examples include admin tooling for your CS department, automated content moderation systems, more thorough logging and monitoring, DDOS mitigation, feature flagging and A/B testing, compliance, etc. Not to mention the overhead of coordinating the work of 50 developers vs 1—deploying over SSH is well and good when you can reasonably expect a small handful of people to need to do it, but automatic deploys from main deployed from a secure build machine is a massive boon to the larger team.

Any one of these things has an obvious answer—just add ${software} to your one VM or create one extra bare-metal build server or put your app behind Cloudflare—but when you have a few dozen of these sources of complexity then AWS's control plane offerings start to look very attractive. And once you have 50 developers on the payroll spending a few hundred a month on cloud to avoid hand-rolling solutions isn't exactly a hard sell.

davedx
1 replies
12h17m

there are a lot of perks that come with a full-fledged control plane that would almost certainly be worth the added cost and complexity.

Such as?

Logging is more complicated with multi container microservice deployments. Deploying is more complicated. Debugging and error tracing is more difficult. What are the perks?

The_Colonel
0 replies
8h45m

You get more tools to mitigate those problems. Those tools add more complexity to the system, but that's of course solvable by higher level tools.

Maxion
0 replies
7h6m

I used to work at a Fintech company where we had around 1-20k concurrent active users, monthly around 2 million active users. I forget the RPS, but it was maybe around 200-1000 normally? We ran on bare metal, bash scripts, not a container in sight. It was spaghetti, granted, but it worked surprisingly well.

cloudking
1 replies
18h28m

+1 or just use App Engine, deploy your app and scale

cglace
0 replies
18h10m

App engine deploys are soooo slow. I liked cloud run a lot more.

sanderjd
0 replies
18h4m

Of course you can get away with that if your metric is revenue. (I think Blippi makes about that much with, I suspect, nary a VM in sight!

The question is what you're doing with your infrastructure, not how much revenue you're making. Some things have higher return to "devops" and others have less.

kevin_nisbet
0 replies
16h52m

I agree, this is an incredibly valid approach for some companies and startups. If you benefit by being frugal and are doing something that doesn't need incredible availability, a rack of servers in a colo doesn't cost much and you can take it pretty far without a huge amount of effort.

appplication
9 replies
19h14m

This sounds great, I’ll be building our prod infra stack and deploying to cloud for the first time here in the next few weeks, so this is timely.

It’s nice seeing some OSS-based tooling around k8s. I know it’s a favorite refrain that “k8s is unnecessary/too complex, you don’t need it” for many folks getting started with their deployments, but I already know and use it in my day job, so it feels like a pretty natural choice.

briandear
7 replies
18h58m

The K8s is unnecessary meme is perpetuated by people that don’t understand it.

actionfromafar
3 replies
18h55m

True, but also, sometimes it’s not needed.

jauntywundrkind
2 replies
14h58m

Sometimes it just feels good wearing a fig leaf around my groin, weilding a mid sized log as a crude club, & running through the jungle.

You might not need it is the kernel of doubt that can undermine any reasonable option. And it suggests nothing. Sure, you can go write your own kernel! You can make your own database! You might not need to use good well known proven technology that people understand and can learn about online! You can do it yourself! Or cobble together some alternate lesser special stack that just you have distilled out.

We don't need civilization. We can go it alone & do our own thing, leave behind shared frames of references. But damn, it just seems so absurdly inadvisable, and it feels so overblown the fear uncertainty & doubt telling us Kubernetes is hard and bad and too much. This article does certainly lend credence to the idea that Kubernetes is complex, but there's so many simpler starting places that will take many teams very far.

samatman
1 replies
14h30m

Somehow kubernetes and civilization just aren't in the same category of salience to me. Like I think it's reasonable to say that kubernetes is optional in a way which civilization isn't.

Like maybe one of those things is more important. than, the other

jauntywundrkind
0 replies
1h10m

I don't disagree, and there's plenty of room for other competitors to arise. We see some Kamal mentions. Microsoft keeps trying to make Dapr a thing, godspeed.

But very few other options exist that have the same scope scale & extensibility, that allow them to become broadly adopted platform infrastructure. The folks saying you might not need Kubernetes, in my view, do a massive disservice by driving people to fragmentedly piece by piece constructing their own unique paths, rather than being a part of something broader. In my view theres just too many reasons why you want your platform to be something socially prevalent, to be well travelled by others too, and right now there are few other large popular extensible platforms that suit this beyond Kubernetes.

sph
0 replies
11h9m

k8s is relatively straightforward, it's the ecosystem around it that is total bullcrap, because you won't only run k8s, you will also run Helm, a templating language or an ad-hoc mess of scripts, a CNI, a CI/CD system, operators, sidecars, etc. and every one of these is an over-engineered buggy mess with half a dozen hyped alternatives that are in alpha state with their own set of bugs.

How Kubernetes works is pretty simple, but administering it is living a life of constant analysis paralysis and churn and hype cycles. It is a world built by companies that have something to sell you.

okasaki
0 replies
8h36m

Just had an incident call last week with 20+ engineers on zoom debugging a prod k8s cluster for 5 hours.

freeopinion
0 replies
15h1m

If they don't understand it but still get their jobs done...

Tractors are also unnecessary. Plenty of people grow tomatos off their balcony without tractors.

If somebody insists on growing 40 acres of tomatos without a tractor because tractors aren't necessary, why argue with them? If they try to force you to not use a tractor, that's different.

notpushkin
0 replies
17h43m

I really hated Kubernetes at first because the tooling is so complicated. However, having worked with raw Docker API and looking into the k8s counterparts, I’m starting to appreciate it a lot more.

(But it still needs more accessible tooling! Kompose is a good start though: https://kompose.io/)

deisteve
7 replies
17h10m

i got excited until i saw this was kubernetes. you most certainly do not need to add that layer of complexity.

If I can serve 3 million users / month on a $40/month VPS with just Coolify, Postgres, Nginx, Django Gunicorn without Redis, RabbitMQ why should I use Kubernetes?

itsthecourier
3 replies
16h42m

Got a bill from usd10k to usd0.5k a month by moving away from gcp to Kamal in ovh

And 30% less latency

deisteve
2 replies
16h36m

thats 95% in savings!!!! bet you can squueze more with hetzner

to ppl who disagree,

what business justifies 18x'ing your operating costs?

9.5k USD can get you 3 senior engineers in Canada. 9 in India.

deznu
1 replies
15h43m

Senior Engineers cost ~$3k a month in Canada?? Seems far-fetched..

chaboud
0 replies
14h40m

We must have very different definitions of senior engineer from the GP, because I’d put the monthly cost of a senior engineer closer to $30k than $3k, even on a log scale.

Employing people requires insurance, buildings, hardware, support, licenses, etc. There are lower cost locations, but I can’t think of a single market on earth where there is a supply of senior engineers that cost $3k/month. And I say this being familiar with costs in India, China, Poland, Spain, Mexico, Costa Rica, and at least a dozen other regions.

mrweasel
0 replies
11h12m

why should I use Kubernetes

You shouldn't, but people have started to view Kubernetes as a deployment tool. Kubernetes makes sense when you start having bare metal workers, or high number of services (micro-services). You need to have a pretty dynamic workload for Kubernetes to result in any cost saving on the operations side. There might be a cost saving if it's easier to deploy your services, but I don't see that being greater than the cost of maintaining and debugging a broken Kubernetes cluster in most case.

The majority of uses does not require Kubernetes. The majority of users who think they NEED Kubernetes are wrong. That's not to say that you shouldn't use it, if you believe you get some benefit, it's just not your cheapest option.

dbackeus
0 replies
2h56m

Coolify does look nice.

But I don't believe it supports HA deployments of Postgres with automated failover / 0 downtime upgrades etc?

Do they even have built in backup support? (a doc exists but appears empty: https://coolify.io/docs/knowledge-base/database-backups)

What makes you feel that Coolify is significantly less complex than Kubernetes?

Kiro
0 replies
13h22m

Why do you need Coolify?

strzibny
4 replies
17h7m

It's good to see new projects. However most people shouldn't start with Kubernetes at all. If you don't need autoscaling, give Kamal[0] a go. It's the tool 37signals made to leave Kubernetes and cloud. Works super well with simple VMs. I also wrote a handbook[1] to get people started.

[0] https://kamal-deploy.org [1] https://kamalmanual.com/handbook/

mplewis
1 replies
11h35m

I’m not going to trust a project like this – made by and for one company – with production workloads.

rcaught
0 replies
9h54m

hahaha, do you even realize what else this company makes?

leohonexus
0 replies
13h30m

Bought both your books, they are awesome :)

dbackeus
0 replies
3h5m

(Reclaim the Stack creator here)

We don't do autoscaling.

The main reason for Kubernetes for us was automation of monitoring / logs / alerting and highly available database deployments.

37signals has a dedicated operations team with more than 10 people. We have 0 dedicated operations people. We would not have been able to run our product with Kamal given our four nines uptime target.

(that said, I do like Kamal, especially v2 seems to smooth out some edges, and I'm all for simple single server deployments)

Summerbud
4 replies
15h48m

The results were a 90% reduction in costs and a 30% improvement in performance.

I am in a company with dedicated infra team and my CEO is a infra enthusiastic. He use terraform and k8s to build the company's infra. But the results are.

- Every deployment take days, in my experience, I need to woke for 24 hr streak to make it work. - The infra is complicated to a level that quite hard to adjust

And benefits wise, I can't even think about it. We don't have many users so the claimed scalability is not even there.

I will strongly argue startup should not touch k8s until you have fair user base and retention.

It's a nightmare to work with.

raziel2p
1 replies
12h39m

sounds like your CEO just isn't very good at setting up infra.

Summerbud
0 replies
12h16m

Maybe, that is one of the possibilities in my mind too.

tryauuum
0 replies
5h41m

...but why? How many services the deployment requires?

cultofmetatron
0 replies
7h39m

DAYS??? our infra takes 10 min usually with up to 45 min if we're doing some postgres maintenance stuff. People in a work context should stick to what they are good at.

b_shulha
3 replies
7h58m

Who are your target audience? There are so many components in this system, so it would require a dev-ops team member just to keep it healthy.

What are the advantages over the (free) managed k8s provided by DigitalOcean?

---

Gosh, I'm so happy I was able to jump of the k8s hype train. This is not something SMBs should be using. Now I happily manage my fleet of services without large infra overhead via my own paas over Docker Swarm. :)

KronisLV
1 replies
7h11m

Gosh, I'm so happy I was able to jump of the k8s hype train. This is not something SMBs should be using. Now I happily manage my fleet of services without large infra overhead via my own paas over Docker Swarm. :)

I mean, I also use Docker Swarm and it's pretty good, especially with Portainer.

To me, the logical order of tools goes with scale a bit like this: Docker Compose --> Docker Swarm --> Hashicorp Nomad / Kubernetes

(with maybe Podman variety of tools where needed)

I've yet to see a company that really needs the latter group of options, but maybe that's because I work in a country that's on the smaller side of things.

All that being said, however, both Nomad and some K8s distributions like K3s https://k3s.io/ can be a fairly okay experience nowadays. It's just that it's also easy to end up with more complexity than you need. I wonder if it's going to be the meme about going full circle and me eventually just using shared hosting with PHP or something again, though so far containers feel like the "right" choice for shipping things reasonably quickly, while being in control of how resources are distributed.

b_shulha
0 replies
7h5m

While k3s make k8s easier for sure, it still comes with lots of complexity on board just because it is k8s. :)

Nowaday I prefer simple tooling over "flexible" for my needs.

Enterprises, however, should stick to k8s-alike solutions, as there are just too many variables everywhere: starting from security, and ending the software architecture itself.

b_shulha
0 replies
7h20m

Oh, thanks for asking. ;)

It is a fair source (future Apache 2.0 License) PaaS. I provide a cloud option if you want to manage less and get extra features (soon - included backup space, uptime monitoring from multiple locations, etc) and, of course, you are free to self-host it for free and without any limitations by using a single installation script. ;)

https://github.com/ptah-sh/ptah-server

But anyway, I'm really curious to know the answers to the questions I have posted above. Thanks!

PaulHoule
2 replies
18h11m

What about “the rest of us” who don’t have time for Kube?

tacker2000
0 replies
10h17m

Im using docker compose on every project I have, and it works fine.

Of course, I dont have millions of users, but until then this is enough for me.

notpushkin
0 replies
18h7m

If you know how to write a docker-compose.yml – Docker Swarm to the rescue! I’m making a nice PaaS-style thing on top of it: https://lunni.dev/

You can also use Kubernetes with compose files (e.g. with Kompose [1]; I plan to add support to Lunni, too).

[1]: https://kompose.io/

pwmtr
1 replies
8h57m

Definitely interesting material. I realized, especially in last few years, there is an increased interest on moving away from propriety clouds/PaaS to K8s or even to bare metal, primarily driven by high prices and also interest of having more control.

At Ubicloud, we are attacking the same problem, though from a different angle. We are building an open-source alternative to AWS. You can host it yourself or use our managed services (which are 3x-10x more affordable than comparable services). We already built some primitives such as VMs, PostgreSQL, private networking, load balancers and also working on K8s.

I have a question to HN crowd; which primitives are required to run your workloads? It seems the OP's list consists of Postgres, Redis, Elasticsearch, Secret Manager, Logging/Monitoring, Ingress and Service Mesh. I wonder if this is representative of typical requirements to run HN crowd's workloads.

evertheylen
0 replies
8h19m

Quite simple, I want to submit a Docker image, and have it accept HTTP requests at a certain domain, with easy horizontal/vertical scaling. I'm sure your Elastic Compute product is nice but I don't want to set it up myself (let alone run k8s on it). Quite like fly.io.

PS: I like what you guys are doing, I'd subscribe to your mailing list if you had one! :)

pton_xd
1 replies
15h59m

"The results were a 90% reduction in costs and a 30% improvement in performance."

What's the scale of this service? How many machines are we talking here?

internetter
0 replies
15h37m

Went from $~7500 to $520/m iirc from the presentation

fragmede
1 replies
19h19m

having your tool be a single letter, k, seems rather presumptuous.

rjbwork
0 replies
19h17m

Especially given K is already the name of an APL derivative.

evantahler
1 replies
19h20m

Porter (https://www.porter.run/) is a great product in the same vein (e.g. turn K8s into a dev-friendly Heroku-like PASS). How does this compare?

mikeortman
0 replies
18h41m

I think the very concept of this is to open source a common stack, instead of relying on a middleman like Porter, which also costs a TON of money at business tier

zug_zug
0 replies
18h17m

Seems like a cool premise. Though I guess people building things always want to convince you they are worth-it (sort of a conflict-of-interest), would like to read an unbiased 7-day migration to this.

thih9
0 replies
12h27m

fully open source stack*. *) Except for Cloudflare

Are there plans to address that too long term?

thesurlydev
0 replies
4h51m

I was excited about this title until I read it's just another thing on top of Kubernetes. To me, Kubernetes is part of the problem. Can we reduce the complexity that Kubernetes brings and still have nice things?

seungwoolee518
0 replies
14h56m

Most of the software should work Out-Of-The-Box, but the real problem is coming from hardware.

sciurus
0 replies
6h23m

Replicas are used for high availability only, not load balancing

(From https://reclaim-the-stack.com/docs/platform-components/ingre...)

An I reading this right that they built a k8s-based platform where by default they can't horizontally scale applications?

This seems like a lot of complexity to develop and maintain if they're running applications that don't even need that.

notpushkin
0 replies
18h1m

It looks like a nice Kubernetes setup! But I don’t see how this is comparable to something like Heroku – the complexity is way higher from what I see.

If you’re looking for something simpler, try https://dokku.com/ (the OG self-hosted Heroku) or https://lunni.dev/ (which I’ve been working on for a while, with a docker-compose based workflow instead). (I've also heard good things about coolify.io!)

noop_joe
0 replies
4h56m

Heroku and Reclaim are far from the only two options available. The appropriate choice depends entirely on the team's available expertise and the demands of the applications under development.

There's a lot of disagreements pitting one solution against another. Even if one hosting solution were better than another, the problem is there are SO MANY solutions that exist on so many axis of tradeoffs, it's determine an appropriate solution (heroku, reclaim, etc) without consideration to its application and context of use.

Heroku has all sorts of issues: super expensive, limited functionality, but if it happens to be what a developer team knows and works for their needs, heroku could save them lots of money even considering the high cost.

The same is true for reclaim. _If_ you're familiar with all of the tooling, you could host an application with more functionality for less money than heroku.

mvkel
0 replies
13h19m

90% reduction in costs

Curious what accounts are being attributed to said costs.

Many new maintenance-related lines will be added, with only one (subscription) removed.

mikeortman
0 replies
18h33m

I'm glad we are starting to lean into cloud-agnostic or building back the on-prem/dedicated systems again.

kh_hk
0 replies
6h3m

We spent 7 months building a Kubernetes based platform to replace Heroku for our SaaS product at mynewsdesk.com.

I thought this was either a joke I was missing, or a rant about Kubernetes. It turned out it was neither, and now I am confused.

hintymad
0 replies
12h34m

A trajectory question: Is there an acceptable solution to federate k8s clusters, or is there a such need? One thing that EC2 was really powerful is that a company can practically create as many clusters (ASGs) of as many nodes as needed, while k8s by default has this scale limit of 5000 nodes or so. I guess 5000 nodes will far from being enough for a large company that offers a single compute platform to its employees.

est
0 replies
12h26m

We spent 7 months building a Kubernetes based platform to replace Heroku for our SaaS product

And heroku is based on LXC containers. I'd say it's almost the same thing.

chrisweekly
0 replies
19h17m

This looks great! Thank you for sharing, @dustedcodes. I might set up a playground to gain more hands-on experience w/ the relevant significant parts (k8s, argocd, talos) all of which have been on my radar for some time... Also, the docs look great. I love the Architecture Decision Records (bullet-point pros/cons/context)...

aliasxneo
0 replies
13h4m

Since there are so many mixed comments here, I'll share my experience. Our startup started on day one with Kubernetes. It took me about six weeks to write the respective Terraform and manifests and combine them into a homogenous system. It's been smooth sailing for almost two years now.

I'm starting to suspect the wide range of experiences has to do with engineering decisions. Nowadays, it's almost trivial to over-engineer a Kubernetes setup. In fact, with platform engineering becoming all the rage these days, I can't help but notice how over-engineered most reference architectures are for your average mid-sized company. Of course, that's probably by design (Humanitec sure enjoys the money), but it's all completely optional. I intentionally started with a dead-simple EKS setup: flat VPC with no crazy networking, simple EBS volumes for persistence, an ALB on the edge to cover ingress, and External Secrets to sync from AWS Secrets Manager. No service mesh, no fancy BPF shenanigans, just a cluster so simple that replicating to multiple environments was trivial.

The great part is that because we've had such excellent stability, I've been able to slowly build out a custom platform that abstracts what little complexity there was (mostly around writing manifests). I'm not suggesting Kubernetes is for everyone, but the hate it tends to get on HN still continues to make me scratch my head to this day.

airstrike
0 replies
18h30m

> We spent 7 months building a Kubernetes based platform to replace Heroku for our SaaS product at mynewsdesk.com. The results were a 90% reduction in costs and a 30% improvement in performance.

I don't mean to sound dismissive, but maybe the problem is just that Heroku is/was slow and expensive? Meaning this isn't necessarily the right or quote-unquote "best" approach to reclaiming the stack

Retr0id
0 replies
17h52m

Based on the title alone, I thought this was going to be people up in arms about -fomit-frame-pointer being used by distros

Havoc
0 replies
4h54m

Toying with self hosted k8s at home has taught me that it it’s the infra equivalent of happy path coding.

Works grand until it blows up in your face for non obvious reasons

That’s definitely mostly a skill issue on my end but still would make me very wary betting a startup on it

GaryNumanVevo
0 replies
6h27m

Potential irony, this site isn't loading for me

AbuAssar
0 replies
18h36m

How does this compare to dokku (https://dokku.com/)?