I often go down rabbit holes like this, trying to collapse and simplify the application stack.
But inevitably, as an application grows in complexity, you start to realize _why_ there's a stack, rather than just a single technology to rule them all. Trying to cram everything into Postgres (or lambdas, or S3, or firebase, or whatever other tech you're trying to consolidate on) starts to get really uncomfortable.
That said, sometimes stretching your existing tech is better than adding another layer to the stack. E.g. using postgres as a message queue has worked very well for me, and is much easier to maintain than having a totally separate message queue.
I think the main takeaway here is that postgres is wildly extensible as databases go, which makes it a really fun technology to build on.
I have certain experience with some technologies, e.g., SQS and Postgres.
Say I'm on your team, and you're an application developer, and you need a queue. If you're taking the "we're small, this queue is small, just do it in PG for now and see if we ever grow out of that" — that's fine. "Let's use SQS, it's a well-established thing for this and we're already in AWS" — that's fine, I know SQS too. I've seen both of these decisions get made. (And both worked: the PG queue was never grown out of, and generally SQS was easy to work with & reliable.)
But what I've also seen is "Let's introduce bespoke tech that nobody on the team, including the person introducing it, has experience in, for a queue that isn't even the main focus of what we're building" — this I'm less fine with. There needs to be a solid reason why we're doing that, and that we're going to get some real benefit, vs. something that the team does have experience in, like SQS or PG. Instead, this … thing … crashes on the regular, uses its own bespoke terminology, and you find out the documentation is … very empty. This does not make for a happy SRE.
Ok. I get that. But to play devil's advocate: with that mentality we'd never learn a new technology and still be stuck on punch cards. And I don't have the time anymore for hobby projects. I'd say it's ok to introduce something new as long as it's one thing at a time and not an entire new stack in the "a rewrite will solve all problems" projects
To me this argument sounds like “I don’t have time for hobby projects, so I’m going to treat this professional one as a hobby”.
I always start a professional project with technologies I am intimately familiar with - have used myself, or have theoretical knowledge of and access to someone with real experience.
There has never been a new shiny library/technology that would have saved more than 10% of the project time, in retrospect. But there have been many who would have cost 100% more.
This isn't a dichotomy.
That is the point of DDD,SoA,Clean, Hexagonal patterns.
Make a point to put structures and processes in place that encourage persistence ignorance in your business logic as the default and only violate that ideal where you have to.
That way if you outgrow SQL as a message bus you can change.
This mindset also works for adding functionality to legacy systems or breaking apart monoliths.
Choosing a default product to optimize for delivery is fine, claiming that one product fits all needs is not.
Psql does have limits when being used as a message or event bus, but it can be low risk if you prepare the system to change if/when you hit those limits.
Letting ACID concepts leak into the code is what tends to back organisations into a corner that is hard to get out of.
Obviously that isn't the Kool aid this site is selling. With this advice being particularly destructive unless you are intentionally building a monolith.
"Simplify: move code into database functions"
At least for any system that needs to grow.
I was not saying "psql is all you'll ever need". I was just replying to
As a general principle.
I take your point but you don’t explain how you came to be intimately familiar with those technologies in the first place. Applied consistently, this logic would seem to preclude becoming familiar with anything.
For projects where I have a paying customer, this rule is absolute; I do not experiment on my client's time (and dime) unless they specifically request it.
But I do have projects which I finance myself (with myself as customer), and which do not have a real deadline. I can experiment on those. Call them "hobby" projects if you insist.
Well, project requirements always rank higher, and many projects require some piece I am unfamiliar with (a new DB - e.g. MSSQL; a new programming language; etc). That means one does get familiar on a need basis , even applying this approach robotically.
If a project requires building the whole thing around a new shiny technology with few users and no successful examples I can intimately learn from ... I usually decline taking it.
I'm okay with new technology, actually, but the person introducing it has to be able to champion it & do the work of debugging issues and answering questions about its interactions with the rest of the system. I.e., they have to be responsible for it.
The last part in my parent comment is more of a "it was chucked over the fence, and it is now crashing, and nobody, not even the devs that chose it, know why".
I do have examples of what you describe, too: a dev I worked with introduced a geospatial DB to solve issues with geospatial queries being hard & slow in our then-database (RDS did not, at the time, support such queries) — so we went with the new thing. It used Redis's protocol, and was thus easy to get working with¹. But the dev that introduced it to the system was capable of explaining it, dealing with issues with it — to the extent of "upstream bugs that we encounter and produce workarounds", and otherwise being a lead for it. That new tech, managed in that way by a senior eng., was successful in what it sought to do.
The problematic parts/components/new introductions of new tech … never seem to have that. That's probably partly the problem: it's such an inherently non-technical issue at its heart. The exact thing almost doesn't matter.
IME it's not. When there are problems, it's never just one new thing at a time.
And the particular system I had in my mind while writing the parent post was, in fact, in the category of "a rewrite will solve all problems".
Some parts of the rewrite are doing alright, but especially compared to the prior system, there are just so. many. new. components. 2 new queue systems, new databases, etc. etc. So it's then hard to learn one, particularly without someone championing its success. It's another to self-learn and self-bootstrap on 6 or 8 new services.
¹(Tile38)
This desire can sometimes be so strong that people insist on truly wacky decisions. I have before demonstrated that Postgres performs perfectly well (and in fact exceeds) compared with a niche graph database, and heard some very strange reasons for why this approach should be avoided. A lot of the time you hear that it's engineers who chase shiny technology, but I've seen first hand what can happen when it's leadership.
Can you expand on Postgres vs Graph Databases?
IIRC, the biggest (AFAIK) graph DB in the world, TAO from Facebook, is based on an underlying MySQL. There must be a good reason why FB prefers a SQL DB over a dedicated graph DB.
It is easy to represent a graph in Postgres using edge and node tables. For the use case we have, it is more performant to query such a setup for many millions of relationships vs using the big names in graph databases.
You just need a little bit of appropriate index selection and ability to read the output of EXPLAIN ANALYZE to do so.
There are probably use cases where this doesn't hold, but I found in general that it is beneficial to stick to Postgres for this, especially if you want some ability to query using relations.
Often referred to as resume driven development.
RDD leaves serious wreckage in its wake.
I've been on both sides of this..
Rabbit MQ and Elastic Search for a public facing site. The dedicated queue for workers to denormalize and push updates. To elastic. Why, because the $10k/month RDBMS servers couldn't handle the search load and were overly normalized. Definitely a hard sell.
I've also seen literally hundreds of lambda functions connecting to dozens of dynamo databases.
I'm firmly in the camp of use an RDBMS (PostgreSQL my first choice) for most things in most apps. A lot of times you can simply apply the lessons from other databases at scale in pg rather than something completely different.
I'm also more than okay leveraging a cloud's own MQ option, it's usually easy enough to swap out as/if needed.
I worked in a team that did this. It was mostly staffed by juniors, and the team leader wasn't very interested in the technical aspects, they just went to a page, checked that the new feature worked alright and gave the green light.
So over the years these juniors have repeatedly chosen different tech for their applications. Now the team maintains like 15-20 different apps and among them there's react, Vue, angular, svelte, jQuery, nextjs and more for frontends alone. Most use Episerver/Optimizely for backend but of course some genius wanted to try Sanity so now that's in the mix as well.
And it all reads like juniors built it. One app has an integration with a public api, they built a fairly large integration app with an integration db. This app is like 20k lines of code, much of which is dead code, and it gets data from the public api twice a day whereas the actual app using the data updates once a day and saves the result in its own Episerver db. So the entire thing results in more api traffic rather than less, the app itself could have just queried the api directly.
But they don't want me to do that, they just want me to fix the redundant integration thing when it breaks instead. Glad I'm not on that team any more.
I think SQS is cheap enough to build on as a messaging queue even if you're not hosting within AWS.
Out of the widely underrated AWS services include SNS and SES and they are not a bad choice even if you're not using AWS for compute and storage.
SQS is at least once PG can give you exactly once
Not sure why this is making people upset.
Because it's incorrect. If you have any non-postgres side-effect, you can't have exactly-once (unless you do 2PC or something like that). There isn't any technology that gives you "exactly once" in the general case.
That's not how exactly once is defined for queue. We are talking about semantics of what queue systems is providing.
Nobody will understand it like that.
Anyone who has ever selected queue service/product will understand it like that. Because thats one of the most prominent features that gets highlighted by those products:
SQS Standard queues support at-least-once message delivery.
NATS offers "at-most-once" delivery
etc.
SQS FIFO has exactly-once processing
well that's a stretch it has "5 minute window" You can hold a lock on a row in PG queue for as long as you need
pgmq (which is linked on this gist) provides an api to this functionality. It can be 0 seconds, or 10 years if you want. It's not a row lock in, which can be expensive. In pgmq, its build into the design of the visibility timeout. FOR UPDATE SKIP LOCKED is there to ensure that only a single consumer gets any message, and then the visibility timeout lets consumer determine how long it should continue to remain unavailable to other consumers.
You get exactly once when you consume with pgmq and run your queue operations inside transactions in your postgres database. I can't think of an easy way to get some equivalent on SQS without building something like an outbox.
SQS FIFO has exactly-once processing
That's not what the parent post was referring to. If SQS (or your ability to talk to it) is down and your database isn't, what do you do?
The problem is rarely cost, it's operational overhead.
Using SQS for a queue rather than my already-existing Postgres means that I have to:
- Write a whole bunch of IaC, figuring out the correct access policies - Set up monitoring: figure out how to monitor, write some more IaC - Worry about access control: I just increased the attack surface of my application - Wire it up in my application so that I can connect to SQS - Understand how SQS works, how to use its API
It's often worth it, but adding an additional moving piece into your infra is always a lot of added cognitive load.
+. And then you have to figure everything one more time when you decide to move to (or to add support for) Azure/GCP.
My saying has always been: be nice to the DB
Don't use it anymore than you have to for your application. Other than network IO it's the slowest part of your stack.
Would you say it's slower than file IO too?
It’s not slow by itself. It’s a single point of bottleneck that will inevitably become slow as you cram everything into it.
...but by trying to avoid the bottleneck and moving things to backend, you make things 10x worse resource wise for the DB. So it is not a easy tradeoff.
Take any computation you can do in SQL like "select sum(..) ...". Should you do that in the database, or move each item over the network and sum them in the backend?
Summing in the database uses a lot less resources FOR THE DB than the additional load the DB would get from "offloading" this to backend.
More complex operations would typically also use 10x-100x less resources if you operate on sets and amortize the B-tree lookups over 1000 items.
The answer is "it depends" and "understand what you are doing"; nothing about it is "inevitable".
Trying to avoid computing in the DB is a nice way of thinking you maxed out the DB ...on 10% of what it should be capable of.
Yes. Aggregations and search are often best done as close to the data as possible, in the DB.
Rendering html, caching, parsing api responses, sending emails, background jobs: Nope.
Basically, use the database for what it’s good at, no more.
Kind of irrelevant since a DB provides some guarantees that a simple file does not by default.
GP was responding to a comment comparing it to network IO in terms of bottlenecks in your application stack ...?
Well, it is file IO, plus processing on top. But it's not that simple, since if your data is small enough it can all be loaded into memory, allowing you to sidestep any file IO. But you still have the processing part...
Handling business logic in the database is often going to be an order of magnitude faster than the application layer of some of the popular language stacks (looking at you, Rails, Node, etc). It also will outlive whatever webstack of the day (and acquisition which of en requires a re-write of the application layer but keeps general database structure - been there done that).
Maybe faster... but I've met very few developers that are good DBAs (that understand procedures, cursors, permissions etc.) Database schema versioning / consistency is a whole other level of pain too.
That sounds like a social problem, not a technical problem.
PG works really well as a message queue and there's several excellent implementations on top of it.
Most systems are still going to need Redis involved just as a coordinator for other pub/sub related work unless you're using a stack that can handle it some other way (looking at BEAM here).
But there are always going to be scenarios as an application grows where you'll find a need to scale specific pieces. Otherwise though, PostgreSQL by itself can get you very, very far.
Worth noting that Postgres has a pubsub implementation built in: listen/notify.
https://www.postgresql.org/docs/current/sql-notify.html
Yep, I should add that. One of the libraries in my list (that I maintain) is WalEx: https://github.com/cpursley/walex/issues
It subscribes to the Postgres WAL and let you do the same sort of thing you can do with listen/notify, but without the drawbacks like need for triggers or character limits.
What's the drawback to a trigger? I would think that any overhead you recouped by avoiding a trigger would be offset by the overhead of sending the entire WAL to your listener, rather than the minimized subset of events that listener is interested in.
(To be clear I do see other downsides to listen/notify and I think WalEx makes a lot of sense, I just don't understand this particular example.)
You don’t send the entire WAL, just what you subscribe to - and you can even filter via SQL: https://github.com/cpursley/walex?tab=readme-ov-file#publica...
This post describes some of the other issues with listen/notify trigger approach: https://news.ycombinator.com/item?id=36323698
Going to add this to my research list.
Ping me if you have any questions. Long time fan of your blog.
That is really cool to hear, thank you.
And I should have mentioned it before, but we have an open call for speakers for the Carolina Code Conference. This would make for an interesting talk I think.
Oh yea, definitely aware of it. I believe many of the queuing solutions utilize it as well.
I've ready a lot of reports (on here) that it comes with several unexpected footguns if you really lean on it though.
It's also worth noting that by using PG as a message queue, you can do something that's nearly impossible with other queues - transactionally enqueue tasks with your database operations. This can dramatically simplify failure logic.
On the other hand, it also means replacing your message queue with something more scalable is no longer a simple drop-in solution. But that's work you might never have to do.
Some applications never grow that much
But it surely will! It will!
See, now that we're profitable, we're gonna become a _scale up_, go international, hire 20 developers, turn everything into microservices, rewrite the UI our customers love, _not_ hire more customer service, get more investors, get pressured by investors, hire extra c-levels, lay-off 25 developers and the remaining customer service, write a wonderful journey post.
The future is so bright!
Honestly, that's one of the reasons I never want to monetize my work and stay miles away from the software industry. Modern world is all web apps that require you to subscribe to 20 different 3rd party services to even build your app. So you rack up bills before your product is even remotely lucrative...
Building an app with no third party dependencies seems impossible nowadays. At least if you plan to compete.
I mean, you _can_ host your staging environment on a Minisforum PC hidden in your closet and then deploy to Hetzner, and probably save a _ton_ unless your service benefits from things like, say, autoscaling or global low-latency access.
Niches where you can get away with that are limited, not just by technical challenges but because large parts of the social ecosystem of IT won't like that. But they do exist. There's also still things that aren't webapps _at all_, there's software that has to run without internet access. It's all far apart and often requires specialized knowledge, but it exists.
Yea I mostly learned web dev so far but wanted to get into IoT stuff so I might find something cool to do in there.
I would go further and even say 'most' applications never grow that much.
The same argument of UNIX design patterns (Single responsibility, well defined interfaces and communication protocals) vs Monolithic design patterns comes up a lot. I think that its mainly because both are effective at producing products, its just that they both have downsides.
I read a meme yesterday about how you can just interject "it's all about finding that balance" into any meeting and people will just agree with you. I'm gonna say it here.
Sometimes a flexible tool fits the bill well. Sometimes a specialized tool does. It's all about finding that balance.
Thank you for coming to my TED talk.
Just noting that sometimes one can do both: seperate postgres DBs/clusters for different use-case, seperate instances of a web server for TLS termination, caching, routing/rewriting, Ed:static asset serving. Benefit is orderly architecture, and fewer different dependencies.
I think a lot of the industry struggles with the idea that maybe there is no "one size fits all", and what makes sense when you're a one person company with 100 customer probably doesn't make sense when you're a 1000 people company with millions of customers.
If you use a stack meant for a huge userbase (with all the tradeoffs that comes with it) but you're still trying to find market fit, you're in for a disappointment
Similarly, if you use a stack meant for smaller projects while having thousands of users relying on you, you're also in for a disappointment.
It's OK to make a choice in the beginning based on the current context and environment, and then change when it no longer makes sense. Doesn't even have to be "technical debt", just "the right choice at that moment".
Yep. And Postgres is a really good choice to start with. Plenty of people won't outgrow it. Those who do find it's not meeting some need will, by the time they need to replace it, have a really good understanding of what that replacement looks like in detail, rather than just some hand-wavy "web scale".
True enough and with modern hardware that barrier is relatively high. IIRC Stack overflow was handling several million users in a single database server over a decade ago... We've got over 8x the compute power and memory now.
Still need to understand the data model and effects on queues though.
The more I do fullstack work the more I see an obesity crisis. I under the need to modularize (I dearly think I do) but god you have relational model, reimplemented in your framework, reencoded as a middleware to handle url parsing, the one more layer to help integrate things client side. I find that insane. And Postgrest was a refreshing idea.
Seriously. There's like 7000 duplicates of the very same data layer in a single stack: database, back-end ORM/data mapper, front end and various caching things in between. Things like PostgREST and Hasura area great pared with fluent clients.
And then there's the failed microservice case.. what some people describe a distributed monolith where data has to be passed around through every layer, with domain logic replicated here and there.
That's a nicely balanced view. I've been working on the intersection between dev, sec and ops for many, many years and one of the most important lessons has been that every dependency is a liability. That liability is either complexity, availability, security, wasting resources or projects or key people disappearing. Do anything to avoid adding more service, library or technology dependencies; if necessary, let people have their side projects and technological playgrounds to distil future stacks out of.
There are good reasons to go OLAP or graph for certain kinds of problems, but think carefully before adding more services and technologies because stuff has a tendency to go in easily but nothing ever leaves a project and you will inevitably end up with a bloated juggernaut that nobody can tame. And it's usually those people pushing the hardest for new technologies that are jumping into new projects when shit starts hitting the fan.
If a company survives long enough (or cough government), a substantial and ever increasing amount of time, money and sec/ops effort will go into those dependencies and complexity cruft.
This is very much the way I'm pushing in our internal development platform: I want to offer as little middlewares as possible, but as many as necessary. And ideally these systems are boring, established tech covering a lot of use cases.
From there, Postgres ended up being our relational storage for the platform. It is a wonderful combination of supporting teams by being somewhat strict (in a flexible way) as well as supporting a large variety of use cases. And after some grumbling (because some teams had to migrate off of SQL Server, or off of MariaDB, and data migrations were a bit spicy), agreement is growing that it's a good decision to commit on a DB like this.
We as the DB-Operators are accumulating a lot of experience running this lady and supporting the more demanding teams. And a lot of other teams can benefit from this, because many of the smaller applications either don't cause enough load on the Postgres Clusters to be even noticeable or we and the trailblazer teams have seen many of their problems already and can offer internally proven and understood solutions.
And like this, we offer a relational storage, file storage, object storage and queues and that seems to be enough for a lot of applications. We're only now adding in Opensearch after a few years as a service now for search, vector storage and similar use cases.
Architecturally, there are other cases besides message queues where there's no reason for introducing another layer in the stack, once you have a database, other than just because SQL isn't anybody's favorite programming language. And that's the real reason there's a stack.
On top of that, a lot of discourse seems to happen with an assumption that you only make the tech/stack choice once.
For the majority of apps, just doing basic CRUD with a handful of data types, is it that hard to just move to another DB? Especially if you're in framework land with an ORM that abstracts some of the differences, since your app code will largely stay the same.