return to table of content

The Great Migration from MongoDB to PostgreSQL

renegade-otter
79 replies
1d10h

The irony of it all is that MongoDB actually matured and you do not have to jump through hoops to have, say, transactions.

I am ALL for Postgres, I even wrote a post about the importance of being fluent in relational databases: https://renegadeotter.com/2023/11/12/your-database-skills-ar...

What grinds my gears is the whiplash caused by this frantic stampede from one hype to the next.

I am just waiting for the AI hype to blow over and for everyone to rediscover Hadoop, or something.

sph
51 replies
1d9h

Postgres is no hype, it was already in plateau of productivity when MongoDB came to the web scale scene.

Hadoop just like Mongo had their hyped time in the sun, but RDBMS are far too advanced and versatile than any of them were.

renegade-otter
49 replies
1d9h

I am not calling Postgres hype - it should have never been NOT hype. It's a reasonable default for most problems,

Now all I read about is how Postgres is awesome, as if it's this great new thing. I guess that makes sense, as the new generation of engineers is rediscovering stable, reliable, lean technologies after a decade of excesses with "exotic" tech.

For grey beards, it's all very odd. Like, "where have you all been?"

arp242
22 replies
1d8h

I remember Robert C. Martin ("Uncle Bob") going on about how No-SQL will replace literally all SQL and that there is literally not a single use case for relational data and SQL. I wonder if he ever came back on that.

Now, my opinion of Martin in general is not especially high and he's a bit of a controversial figure, but it wasn't just the kids. And Martin is also all about reliable software, so that makes it even more surprising.

rjbwork
12 replies
1d7h

He's always been fundamentally a salesman. Hopping on the hype trains lets him sell more.

pydry
11 replies
1d7h

He's always been fundamentally a zealot who sees most narratives in black and white terms. It's an extremely unfortunate characteristic of a software engineer.

It's the same reason he's so pro Trump I think.

arp242
9 replies
1d2h

I didn't even know about that; I checked his Twitter, and first thing is a reply to the Baltimore bridge collapse:

Doors and wheels fall off airplanes. Hundreds of illegal immigrants assault and overrun border guards in Texas. We can’t even muster support for our allies in the security council. But at least our pronouns are in order.

https://twitter.com/unclebobmartin/status/177262869295654553...

You really need to be a special sort of individual to go from a collapsed bridge to ranting about pronouns with no prompting or any prior mention of that. What a complete spanner.

zo1
7 replies
1d1h

From the other side, people like me, see your odd dismissive behaviour in the same light. This bridge and hundreds of other things going wrong throughout the world atm are all symptoms of some sort of rotten problem, and I honestly don't know why you guys are so adamant about dismissing it. Almost as if you want to see it all come crashing down.

Pure evil, as Trump would call it.

pydry
1 replies
7h45m

From the other side, people like me, see your odd dismissive behaviour in the same light. This bridge and hundreds of other things going wrong throughout the world atm are all symptoms of some sort of rotten problem

Just one problem, huh?

This illustrates what I meant about black and white thinking.

zo1
0 replies
6h54m

Not sure what angle you're going with. If you want to say there that there is more than one rotten problem at the core of society, go for it. I'd be the first to agree with you.

This is not black and white reasoning. Each problem can be on a continuous plane, from "slightly bothersome" to "there is an absolutely horrible evil lurking here".

jimbokun
1 replies
1d

Is the rate of these incidents increasing, decreasing, or about the same?

zo1
0 replies
23h7m

Obviously you can say I'm biased but I think they're increasing.

rjbwork
0 replies
1d

of some sort of rotten problem

Agree. Out of control neo-feudal lords taking up all the slack out of every system they can get their tendrils into in the name of efficiency, thus endangering us all in order to extract as much profit as possible out of our entire economic system.

The "other side"'s solution tends to be to further deregulate industry and the economy, and to empower and lower the tax burden of these neo-feudal lords, all while pointing at gays, trans-folk, and immigrants as if they are the reason people can't afford rent and bridges are falling down.

renegade-otter
0 replies
23h55m

Another man opens the morning newspaper, and after scanning the headline, declares: "Honey, this planet is going to hell".

arp242
0 replies
1d1h

Yes, a single accident of the type that's been happing sine bridges were invented, pronouns, and unconditional support for an out of control government in Israel are all deeply related...

Or he's a hyper-politicized idiot completely consumed and obsessed by some topics and needs to talk about them all the fucking time at the drop of a hat, whether appropriate or not.

It has nothing to do with "sides"; it has to do with the forceful injection of completely unrelated matters.

forgotmyinfo
0 replies
1d2h

That happened to esr, too, although he derailed right after 9/11, and it's been downhill ever since.

wesselbindt
0 replies
1d6h

The reason he says he voted for Trump is taxes. But that seems like a pretty flimsy reason. Have dems in the past couple decades ever reverted any republican-instated tax cuts for the wealthy/corporations, or even made any moves for tax bumps? In my experience dems are just republicans, except they wave a rainbow flag every now and then.

Shorel
2 replies
1d2h

His words: "I am also self-taught. I never got a degree of any kind. I entered the industry at the tender age of 17 and never looked back. Frankly, school isn’t all it’s cracked up to be — especially in the software field."

https://twitter.com/unclebobmartin/status/107212575854868889...

Which means his opinions are more often than not uninformed opinions and should be taken with a grain of salt.

sph
0 replies
11h15m

I think you overestimate college education a little too much. Having a degree doesn't make your opinions automatically better than those that didn't have the chance to get one. You wish it did, but it don't.

fijiaarone
0 replies
17h10m

When you turn 23 in a couple years, you’ll look back at college and realize you weren’t at the apex of all knowledge back then.

GiorgioG
2 replies
1d5h

Unpopular opinion, Robert C Martin has caused more harm than good. People take his ideas and blindly follow them no matter the cost. I’m looking at you clean architecture.

forgotmyinfo
0 replies
1d2h

Well yeah, solving problems and thinking critically and learning are hard and boring. "Uncle Bob" gives you handy alhorisms you can spam on your next pull request and then feel like you've done something.

arcbyte
0 replies
1d2h

There's definitely a missing piece he hasn't explained well enough. I've seen the best code in the world because it was clean architecture and I've also seen some truly atrocious code in the style of "clean architecture".

To me the difference was in recognizing that the architecture advice is implicit - its not actually written down in the code, rather the code exemplifies it - and the business domain is explicitly modeled. When developers name everything "use case" and "interactor" and go crazy with packages and folder structures with architectural names it's a recipe for terribleness.

belter
1 replies
1d5h

I also remember Michael Stonebraker basically saying something, but being ignored...Along the lines of "I will see you all map reduce and nosql people in 20 years, after you have all reinvented SQL...."

Not an actual quote but a factual statement

jimbokun
0 replies
1d

Because he realized you can parse SQL and generate the equivalent of a map reduce job on the backend to execute it.

forgotmyinfo
0 replies
1d2h

That's as ridiculous as saying "integer division will be useless in 20 years". Sometimes relational databases really are the optimal solution to certain problems. I really wish CS had better authority figures, although we shouldn't need them in the first place.

tomnipotent
8 replies
1d7h

Postgres as we know it has only existed for about a decade, since the post-9.x era in 2010-2014 when many of its lauded features were added. Replication, heap-only tuples, extensions, foreign data wrappers, JSON, leveraging multiple indexes, parallel index scans, reindex concurrently, JIT compilation, declarative partitioning, stored procedures, and major WAL improvements are all "recent".

I love Postgres and it's been my go-to default since 2014 and you can pry it from my cold dead hands, but it's more contemporary with Perl 6 than Perl 5 if we're talking grey beards.

Sesse__
7 replies
1d7h

Extensions and stored procedures are not; they've been around for longer.

I do not actually use any of the others you listed (well, occasionally bitmap index scans, but it's not a killer). The main difference between 8.x and 15.x for me is the small stream of optimizer features and tweaks (e.g. at some point, it started being able to reorder outer joins).

tomnipotent
6 replies
1d2h

Postgres 8 supported functions capable of returning tuples (RETURN QUERY), but lacked transaction control and other features that CREATE PROCEDURE added in Postgres 11 (2018).

Extensions in Postgres 8 (and earlier) were also half-baked compared to 9.1 (2011), which added CREATE EXTENSION and other quality-of-life improvements including standardization.

The latest complete implementations are what most people have in mind when talking about them.

Sesse__
5 replies
19h51m

Sure, there are improvements, but it's not like Postgres was a radically different beast back then.

Would I pick Postgres 15 over Postgres 8? Yes, any day of the week. But I would also readily pick Postgres 8 over most of today's databases (including e.g. the latest MySQL and MongoDB versions), for most projects, if that was the choice I had.

tomnipotent
4 replies
11h51m

I would not have risked my business (or budget) on Postgres 8, even with Slony-I. My appreciation for its well-written code and thoughtful architecture simply did not outweigh what it lacked for a reliable operations experience.

MySQL foot guns were mostly under control by 2010, and it was operationally efficient to manage a cluster of MySQL servers in diverse configurations with mostly-known boundaries to what could go wrong.

The Postgres foundations are the same today, but what you can build on top is very different today than it was pre-9.x.

Sesse__
2 replies
10h39m

I would not have risked my business (or budget) on Postgres 8, even with Slony-I.

This assumes you need replication, though. You can get pretty far without it, especially given how wide hardware you can get now.

MySQL foot guns were mostly under control by 2010

Eh. :-) I think you would have to specify which footguns in particular.

turtles3
0 replies
9h44m

Replication can be for HA, not just for scale. All depends on your business requirements.

Also replication can be good for other operational reasons, such as zero downtime major version upgrades. Again depends on the business need/expectations.

tomnipotent
0 replies
9h30m

If you're doing tens of millions in revenue and the RDBMS is critical to your operations, I'm a firm believer you need replication. Keep in mind this was the era of spinning rust and a day-old backup could take hours to restore even with a SAN/DAS. A replica was for more than just distributing load, it was for resilience to keep the business healthy and avoid a SPOF.

Risks to the business are more important to me than my enthusiasm for a technology or product.

sph
0 replies
11h10m

I operated MySQL clusters in the late 2000s. I still am surprised how many hoops you have to jump through to set up replication in PostgreSQL than what you could do - out of the box - with MySQL 15 years ago.

benterix
7 replies
1d4h

For grey beards, it's all very odd. Like, "where have you all been?"

To be frank, PostgreSQL has evolved a lot since the late nineties. There was a time where people preferred MySQL over it at it seemed to work faster, certain things were easier and so on.

hibikir
3 replies
1d3h

I used postgres in the 90s. The biggest complaint was that postgres's default config needed a change so you could connect to it remotely, as the default, wisely at the time, decided that allowing connections from any IP by default before you changed any security setting whatsoever was a real problem.

The MySQL performance advantages were large, with the small caveat that it was fast as long as you were using MyISAM, which achieved its speed by throwing away many safety features, like working transactions and foreign key constraints. Unsurprisingly, when eventually it got an engine that didn't throw caution to the wind, the performance advantage went away.

I spent a lot of time in the early 2000s showing people whose that many of the issues they were facing with their e-commerce websites were ultimately down to the MySQL engine they had picked. But it's faster! Yes it is, but now your application code has to deal with all the things the engine has now offloaded to your application code.

It's not that Postgres' changes were key to increasing market share, but that as MySQL matured, it had no options but to lose its 'advantages'.

kstrauser
2 replies
1d3h

That mirrors my experience. MySQL was faster for use cases where data loss was acceptable. Will anyone die if you lose some web log analysis stats? Probably not. You wouldn’t want to use it for anything where guaranteed consistency was critical, like anything involving money, but it was an alright replacement for things like an in-memory data store.

I say “was”. I hear it’s vastly better now.

gkbrk
1 replies
1d3h

If you want a fast DB for web log analysis, you shouldn't look at MySQL or PostgreSQL anyway. ClickHouse will run circles around both of them for the use case of web log analysis.

kstrauser
0 replies
1d3h

That was an example of the sort of thing that someone might reasonably have used MySQL+MyISAM for in the past. I wouldn’t use that combination at all today.

jimbokun
0 replies
1d

MySQL was more popular for a while, but from a technical standpoint Postgres was always ahead.

fijiaarone
0 replies
17h1m

PostgreSQL didn’t exist in the 1990s. It was called Ingres. Postgres started as a personal project and was first released in 1999, but was unusable. Around 2004 the project started getting popular and hipsters started screaming that MySQL didn’t have stored procedures and wasn’t fully ACID. Acid became a buzzword for PG fanboys, even though data corruption and memory leaks plagued the system for the first decade. It became a stable around 2008 as long as you didn’t mind restarting your database every few days. PostgreSQL didn’t really become a viable option until around 2010.

Shorel
0 replies
1d2h

There was a time PostgreSQL did not run under Windows, and that is IMO what gave MySQL the market share edge.

Without these windowless years, PostgreSQL would be the most used RDBMS right now.

sph
5 replies
1d8h

Excited, vocal users make the most noise.

Somehow the entire Mongo DB era passed me through and I never used it once. I used to use MySQL in the 2000s, and switched to PostgreSQL in the 2010s.

benterix
2 replies
1d4h

You haven't missed much TBH. I think the only case where you'd use it in new projects in DynamoDB if you work with AWS.

finikytou
1 replies
1d4h

Dynamo is not mongo?

benterix
0 replies
1d2h

No, DocumentDB is Mongo. What I understand the OP parent referred to is the NoSQL hype era, not just Mongo which was probably the most popular one at that time.

jimbokun
0 replies
1d

My company considered it, but went with Cassandra instead.

LtWorf
0 replies
1d3h

They used it at my 1st job, a startup.

They had an ORM which made everything much slower, and the most common operation they did was to add items into lists, which causes the whole document to be resized and requires it to be moved.

It could have been done with 1 sql server, but we needed 4 really heavy mongodb servers instead. And of course our code was buggy so the schema was all over the place.

rendall
1 replies
1d8h

...and, continuing the cycle next season, devs will (re)discover NoSQL and document DBMS. I do wonder if this phenomenon of trendy tech is somehow integral to advancing the state of the industry, the SWE equivalent of early adopters. "Rediscovering the old" is a side effect, as old tech is used and reinterpreted in new ways. Kind of neat, really.

Retric
0 replies
1d8h

There’s an entire industry of people selling knowledge of the “next great thing/the one true way," but they need a new thing every few years as people learned whatever the old thing was.

forgotmyinfo
0 replies
1d2h

This is what happens when you hire 20-somethings who don't know how computers actually work to build your entire business. You either learn about relational databases in school, or through experience. Relational algebra has been around for half a century, at least. If someone doesn't know about its utility, it's a failure of the education system, or a failure of corporate incentives.

nemo44x
0 replies
1d3h

MongoDB is more popular that it ever has been. You can look at their quarterly filings and learn this. The NoSQL space is larger than it ever has been and is the fastest growing segment in the database world.

thiht
21 replies
1d7h

Postgres is not hype though. There's very few legitimate use cases for not using a RDBMS as a main data store, and Postgres happens to be the most popular nowadays, for good reasons.

bradleyjg
8 replies
1d5h

The number one reason being that it’s free. In a purely technical comparison it is not at the top of the list.

simonw
7 replies
1d5h

What's top of the list technically?

bradleyjg
6 replies
1d4h

Oracle

pmarreck
3 replies
1d4h

Oracle is a giant mess, is the sense I got from trying to work with it in the past.

While I am loathe to recommend any Microsoft product, SQL Server is at least coherent, and has actual transactions like Oracle (and not just "checkpoints")

paulryanrogers
2 replies
1d3h

Aren't MSSQL and Oracle just doing redo logs, like MySQL?

What do you consider not actual transactions?

Postgresql keeps dead tuples in place until vacuumed, which seems optimized for rollbacks. But isn't so bad if you're inserting much more than deleting or updating.

pmarreck
1 replies
5h30m

Nestable transactions = “real transactions” to me, I guess.

I almost don’t see the point in non-nestable transactions

paulryanrogers
0 replies
4h41m

SQL standardizes SAVEPOINT for such nesting. Is that not sufficient? FWIW Postgres supports it.

VWWHFSfQ
1 replies
1d4h

Anybody that has ever used Oracle knows how nice it is. But very few people have that experience because their company isn't spending millions of dollars on the database server.

bradleyjg
0 replies
1d3h

Yes, that was my point.

I get that working with Oracle and their army of lawyers is not cheap or pleasant but the db is still excellent.

darby_eight
4 replies
1d2h

There's very few legitimate use cases for not using a RDBMS as a main data store

I don't think that's true—all the old cases that were legitimate before are still legitimate. The price of both persistent storage and memory has simply come way down to the point where many computational workloads are viable under vertically scalable databases again. I suspect the pendulum will swing the other way one day yet again (and of course there is a rich ecosystem of ways to query horizontally, if not transactionally, with SQL that will likely temper this swing even further).

wlll
2 replies
1d2h

The price of persistent storage and memory was the same for Mongo as it was for Postgres back when the NoSQL movement happened. Mongo wasn't made of magic scaling fairies and still needed resources. As soon as you cranked up the safety of Mongo to a nearly acceptable level it's performance fell through the floor. The only reason people though it performed amazingly was because of Mongo's rather deceptive marketing.

I use the term "nearly acceptable" because for a long time Mongo's data consistency was absolutely crap (https://jepsen.io/analyses)

Personally I think people used Mongo because it was new, shiny and exciting and it promised performance (though it didn't really deliver for most people).

darby_eight
1 replies
23h53m

I am certainly not going to defend mongo ever having being a good choice; I was referring to other "NoSQL" databases (namely, the large number of highly scalable key-value stores).

thiht
0 replies
8h55m

But you wouldn’t use a key value data store as your main store. It’s a specialized kind of store. Of course Redis and friends have their place. Just not as a main store.

bradleyjg
0 replies
1d2h

In addition to changes in ephemeral and persistent memory, the other big difference between now and nosql’s heyday is improvements in distributed relational transactional databases (newsql).

We haven’t exactly circled the square on CAP but we’ve certainly bent it.

circusfly
3 replies
1d4h

I've nothing against Postgres but MySQL is a wonderful option that is likely still in far higher and heavier use, when considering open source databases.

wlll
0 replies
1d2h

I wrote this in another comment, but it's relevant here I think:

" I do scaling and performance work, mostly with Rails apps, but a significant amount of the work is database level and not language specific. I've used both postgres and MySQL (and a few other databases) going back to 2000.

The best thing I can hear from a company when I start is "We use Postgres". If they're using postgres then I know there's likely a far smoother path to performance than with MySQL. It has better tooling, better features, better metadata. "

Right now I would not choose MySQL over Postgres at all, ever. I can't think of a single way it is materially better.

kstrauser
0 replies
1d3h

In my mind, MySQL no longer exists. It’s a weird Oracle thing that I wouldn’t touch with a lineman’s pole. MariaDB is what happened after MySQL disappeared.

LtWorf
0 replies
1d3h

I think it gained popularity over a decade ago, when it was faster because it was unsafe.

In the end postgres is the better one. With mysql the defaults are bad.

oldsecondhand
1 replies
1d5h

In my experience MySql is more popular. Worked at several companies that used MySql, none that used Postgres.

Kids at universities still use XAMMP to learn databases.

lutoma
0 replies
1d5h

In my experience MySQL is still commonly used in the PHP world, but everything else is mostly Postgres

jimbokun
0 replies
1d

In my opinion the use case is when your data no longer fits on a single, potentially very large, server. Automatic distribution and replication and rebalancing of data is a tricky problem and something like Cassandra handles those very well.

Or Cockroach DB or similar, if you still need relational capabilities.

debarshri
2 replies
1d8h

Generally with these migrations biggest problem in my experience is that all the operational learnings are not transferrable.

You often want to create expertise and achieve operational excellence with the tech that you start out.

renegade-otter
1 replies
1d7h

Back in the day we used to say "you choose your database once". You can swap in/out a lot of things, but your data store is really the core.

It is shockingly hard to change your storage engine once you are serving customers, so you really want to get that right.

cpursley
0 replies
1d7h

Exactly, languages and frameworks come and go - but get the data layer right and you’re set.

bdzr
1 replies
1d6h

The irony of it all is that MongoDB actually matured and you do not have to jump through hoops to have, say, transactions.

Are transactions still limited to a single document in Mongo?

throwitaway1123
0 replies
1d1h

Mongo has had multi-document transactions since 2018 (version 4.0), and joins (the $lookup aggregation pipeline stage) since 2015 (version 3.2) [1].

[1] https://www.mongodb.com/evolved#mdbfourzero

onetimeuse92304
51 replies
1d9h

Ah, the old "I used a wrong product for my problem and now I complain the product is bad because it does not suit my case." defence.

MongoDB is a document database. It is not supposed to be good at relations. Also lacking support with cloud providers and lacking experience with MongoDB is not MongoDB's problem, it is your poor decisionmaking. If you value those things, you should have taken it into account when you were choosing MongoDB in the first place.

Now, I am not a big lover of MongoDB. Some years ago I was forced to use it and I had very low opinion of it as quite immature product. It is still far behind in maturity compared to something like Oracle database or PostreSQL, but in the meantime I learned to appreciate some of the things MongoDB is good at.

I also admit that MongoDB's transactions are a total joke. It should be prominently placed in the documentation that you are using them at your own risk. I don't use MongoDB transactions anymore because there are better ways to architect your application with MongoDB without using transactions.

I like MongoDB for the ease of use when rapidly prototyping things. I like the concept of a no-schema, document database and some of the additional features it provides to deal with the document. I like its reactive java driver which is a breeze to use to construct data pipelines quickly. I like change streams.

In the end, I think it is good to have a selection of tools that are good at doing different things.

It is our responsibility to chose the right tool for the job. If you chose poorly, don't try to fault the tool for it.

thih9
11 replies
1d8h

To be fair, Mongodb’s pr team did their best to present it as the solution to all problems.

In 2012, 10gen’s VP of Corporate Strategy Matt Asay argued “there will remain a relatively small sphere of applications unsuitable for MongoDB … the majority of application software that developers write will be in use cases that are better fits for MongoDB and other NoSQL technology … Those functions that really help a company innovate and grow revenue [will be NoSQL].” He would note that we were living in the post-transactional future.

https://news.ycombinator.com/item?id=15124306

onetimeuse92304
7 replies
1d7h

To be fair, almost all product PR teams do the same. I just ignore most of what they say in favour of my own opinion that I try to build based on reading actual technical documentation, experimenting with the product, etc.

Anybody who makes tech decisions based solely on what PR teams say is naive and incompetent at best.

thih9
2 replies
1d3h

To be fair, almost all product PR teams do the same.

No, mongodb’s PR approach and results were largely uncommon at the time. They didn’t advertise directly but instead targeted and amplified dev communities - as detailed in the article linked earlier.

The result was that people not associated with mongodb talked about mongodb at various dev conferences and in blog posts. People didn’t want to listen to PR teams then too, but they followed their peers.

onetimeuse92304
1 replies
1d3h

Again, I will stick to saying that if you use technology you have no experience with, the problems resulting from it are on you.

Testing out technology on important projects is reckless and is a risk a lot of people take but push away any responsibility for.

As a professional, before I suggest to use a technology at a project, at first at the very minimum I spend some time with it to learn the ins and outs. Every technology has cons. Just because you don't know them does not absolve you from responsibility for them.

The reality is that most applications are boringly similar once you define the problem in simple words. There is very rarely a need to use new technology. It is my experience that it is usually better to use technology that is not necessarily cutting edge but one that you know well and that you can easily hire people who know it well.

Because no technology is perfect, but it is much better to know the cons ahead of time and know you will have people who will know the problems and who can find reasonable solutions for.

I am personally sick and tired of people who first advocate for new technology to be added to the project, touting its supposed benefits, not knowing the cons, then don't want to take responsibility once we actually find the dead bodies.

rokkitmensch
0 replies
1d3h

The funny thing is that they chose Mongo precisely because they were familiar with it...

zzzeek
0 replies
1d3h

MongoDB came into the community like a ton of bricks, taking over conferences, flooding the zone with mugs and t shirts, and for a time there, you couldn't view the front page of hacker news without at least two MongoDB posts. The death of SQL, now known to be an inferior relic of the past, was a regular topic of discussion. There's a reason why the "Web Scale" meme is so famous, because that's what actually happened for awhile there. SQL was an inferior, leaky abstraction, and ACID was not "web scale". it became all about the CAP theorem (which I hardly ever see anyone writing about these days).

only a year or two later, when teams that went all in on MongoDB at the behest of MongoDB's marketing department started realizing they'd been sold a bill of goods, did the slow and arduous march back to what continued to be the best generalized solution for 95% of data problems, the ACID compliant relational database, begin to occur. given that, this blog post seems really behind the times.

taffer
0 replies
1d2h

As a counterexample, Azure Cosmos or Amazon DynamoDb are marketed as specific solutions for specific problems. Other RDBMS like SqlServer or Postgres are marketed as general purpose solutions because they are in fact general purpose solutions. Mongodb, on the other hand, is a specialized "document database" marketed as a replacement for relational databases.

swasheck
0 replies
1d2h

mongo was absolutely one of the most egregious (daresay dishonest?) purveyors of such myth. it was quite off-putting for me and my team.

benterix
0 replies
1d4h

To be fair, almost all product PR teams do the same.

As it is their "duty" to twist the truth in this way, it is our role as the users of these technologies to present things as they are.

gwbas1c
1 replies
1d3h

They were doing that back in 2010. I went to a few conferences where they promoted MongoDB as an all-purpose database that was the "next thing" compared to SQL.

Once everyone realized its shortcomings as a general-purpose database, there was a gradual "oh shit this isn't working" migration back to SQL-based databases.

I think the big problem is that the MongoDB programming model and APIs are very nice; we really need a database that is still relational "under the hood" but has an API that's more like MongoDB.

bashinator
0 replies
1d2h

I believe one of the job interview problems at Mongo for software engineers, was to design a relational layer on top of the document store.

liquidgecka
0 replies
1d2h

In 2010-2011 time frame the mongo team sponsored an effort inside of Twitter to replace MySQL with mongo. There was a group of us that worked to migrate the tweet store to Cassandra that were able to talk with leadership and get that initiative killed. Turns out migrating HIGHLY structured data into mongo was never a good idea, especially at that scale.

dangtony98
10 replies
1d6h

Author of the article here!

The article itself does not "complain the product is bad" but some other comments in this thread would certainly suggest so. Instead, the article says that the use-case evolved, states reasons for why MongoDB was no longer suitable (while at some point it was) and why PostgreSQL was chosen instead, and discusses the migration process involved in the transition and the results.

Regarding lacking support with cloud providers and lacking experience with MongoDB is indeed NOT MongoDB's problem. However, it is a problem for users of the platform that are trying to self-host it with MongoDB as a database dependency and therefore justifies the PostgreSQL migration with PostgreSQL being a better candidate for this use-case.

To be clear, the article is not saying that MongoDB is bad and no fingers are being pointed. I would in fact say that there are an array of use-cases where MongoDB is an excellent choice (e.g. I resonate with the point on rapid prototyping).

Please don't skew the words.

finaard
8 replies
1d4h

To me the article read more that PostgreSQL might have been the better choice from the beginning - but due to lack of experience in the team MongoDB was "good enough" at that point.

tkellogg
7 replies
1d3h

I recall when my team chose MongoDB ~2011, Postgres & friends didn't have JSON columns, so there was a lot of extra data modelling that probably was unnecessary.

The biggest use case for MongoDB was for huMongous data. Obvs MongoDB was a good fit, because of the name.

vbezhenar
3 replies
1d2h

I still didn't have a single use-case to use JSON columns yet. Sometimes I feel that JSON columns are made for people who don't want to properly design their domain model. Big mistake.

unclebucknasty
0 replies
1d1h

Interesting point, but I wonder if there's a little confirmation bias and/or circular logic there.

In other words that you don't see a use case for JSON columns because you don't think it's good DB design.

I know I have used them after decades of RDBMS work, and it feels like I'm "cheating" a bit. But they really do make sense for some use cases that can be harder to achieve with traditionally normalized RDBMS alone. For instance, for configuration options or scenarios where different objects may have a few different attributes, but are largely the same.

If you've ever used an entity-attribute-value data model or otherwise ended up with table proliferation or a bunch of nullable columns to accommodate the latter use case, you might appreciate the simplification JSON columns offer.

onetimeuse92304
0 replies
1d

I think it is the same misguided discussion as strongly, staticall typed vs weakly typed languages.

(Mind I mean the discussion is misguided, not you)

The problem is that relational model is a one solution to certain problem of when you want to pay the price of enforcing the model of your data. Relational database is when you pay all the cost upfront.

With MongoDB I have other options.

It does not mean they are necessarily better options. But some of them are better in some situations, for example when rapid prototyping. Or when you need to preserve data in different formats as the application evolves.

Personally, if you can, you probably should model your data upfront. But it is good to have other options, sometimes.

abound
0 replies
1d2h

The main use case I find myself reaching for JSON/JSONB fields is unstructured map-like metadata, like KV tags on a resource, where the data model/domain has no expectation that any given field is set.

jerf
1 replies
1d3h

Yes, I think it's reasonable to say the NoSQL community drove the JSON columns in the relational databases. And while the JSON-ness of said columns is not necessarily the relevant bit to me, having the escape hatch of being able to just sort of splat out a "other" column has been something I've used several times, and it has relieved some non-trivial stress and strain in relational designs in my opinion. It's not even that hard to migrate something out of the JSON column if you want to raise it to a full column later, and JSON indices mean you may not even really need to.

I have a hard time justifying a document database in 2024 when I have 2024 Postgres; I don't know how large the window is where MongoDB will do something Postgres won't, but I know my systems are not in that window. (Assuming it even exists, which I wouldn't be surprised it doesn't, but for the sake of argument I will assume it does.) But a decade ago that would be a much harder decision if I truly had a document-based use case.

jimbokun
0 replies
1d2h

Horizontal scaling is still challenging in Postgres. That’s where “NoSQL” systems still have their place. Or one of the SQL systems layered on to a NoSQL architecture, with the trade offs that entails.

wlll
0 replies
1d2h

https://www.postgresql.org/about/featurematrix/#json

You're right, 9.4/9.5 were when JSONB was introduced and expanded (release 2013/2014).

That said, it's a big decision to go with two very very different technologies (relational vs Document store) just for a column type, storing JSON must have been a pretty major product feature?

lloydatkinson
0 replies
1d3h

I have noticed in recent times comments on here have been getting more and more like this. An article I posted a few weeks ago got the same treatment even though I iterated multiple times the same points as I anticipated these kinds of comments from people that didn’t read it fully.

999900000999
6 replies
1d2h

As part of that stack, we chose MongoDB + Mongoose ORM because the combination presented least overhead and allowed us to ship quality features quickly. As Sir Tony Hoare states, “premature optimization is the root of all evil,” and there was certainly no need for further optimization at the time.

Looks like the tool did a great job while they were getting started. Mongo is very easy to hack something together with, later on it looks like they just needed to migrate to something more stable

swasheck
1 replies
1d2h

yeah ... in the early days of mongo it was pitched as a replacement for relational because a) it had faster writes without those pesky joins and constraints, and b) because it's too hard to model data, so chances are it's just unstructured. the onslaught of "relational is dead" and "joins are annoying" was tremendous

999900000999
0 replies
23h26m

Plus there's a good chance they didn't know exactly what they needed to build when starting out.

jimbokun
1 replies
1d

Postgres and MySQL are easy to hack something together with, too.

999900000999
0 replies
1d

That's assuming you're experienced with them, if you know next to nothing about databases, Mongo is a much better choice. In this case it looks like when they got started the team just didn't have all that much experience with databases.

forgotmyinfo
1 replies
1d2h

Nonsense. Choosing the correct database isn't "premature optimization", it's the bare minimum of being a competent programmer. Why do we never have time to do it right, but we always have time to do it over?

999900000999
0 replies
1d2h

It's generally faster to get started with Mongo.

If you're talking about a startup, you might not even get to the let's do it over stage. You might have six months to just hack something out so you can raise more funding, and in that case Mongo is a great choice.

The needs of your business might also just evolve over time.

taffer
4 replies
1d2h

So what is a good use case for Mongodb? I have never seen an application where a "document database" would have been a good choice. In every project where I have seen Mongodb, it has turned out to be the wrong choice.

Even for prototyping there are many other good choices since RDBMS like Postgres have implemented JSON support. Mongodb looks like a solution in search of a problem.

joneholland
3 replies
1d2h

Applications that are essentially form wizards are a great fit for a document database.

Think application forms etc.

wlll
2 replies
1d1h

Is there a benefit of Mongo for that over just Postgres with JSONB columns? You're still storing JSON in Postgres but you'd get the relational aspects too for things like users having many forms, billing and account relationships etc.

unclebucknasty
0 replies
1d1h

No benefits that I can think of at the database layer. Postgres's addition of JSONB columns represents the best of both worlds. Funny to think that all of that noise about nosql replacing rdbms was essentially nullified by the addition of one column type.

Some people do like the MongoDB API, however.

joneholland
0 replies
1d1h

Easier horizontal scaling and organizational inertia would be the main reasons to use mongo over a jsonb column. I wouldn’t introduce it to a psql shop if they are already great at running psql.

CuriouslyC
4 replies
1d6h

If mongo is just a document database, there's really no reason to use it over elastic. The query story is slightly nicer with mongo, but we're not doing relational algebra here, right? Elastic crushes mongo at literally everything else.

finikytou
2 replies
1d4h

i guess elastic is more heavy setup for some POC.

CuriouslyC
1 replies
20h26m

That might have been true in the past but it's not hard to PoC with elastic using docker compose.

synthc
0 replies
17h47m

Elastic is much more of a pita to maintain and monitor than Mongo

LtWorf
0 replies
1d3h

ES had a bug where a syntax error in a query would send it in a weird state that would then give wrong answers to all queries until the process got rebooted. Which of course isn't fast.

I reported the issue, and thankfully I changed job and never had to deal with ES again. After a few years they contacted me to ask if the issue was still there, I said I had no idea.

I have no idea if the issue is still there.

forgotmyinfo
3 replies
1d2h

Sorry, 100 times out of 100, "rapid prototyping" means "we built this with the wrong database, now we're stuck with it". If it isn't obvious how to store your data up front, then you either aren't planning your software well, or you don't actually know what you're building.

wlll
2 replies
1d2h

I was also a bit confused by that part. What is it about Postgres that prevents rapid prototyping? It's not like the schema is set in stone, or even particularly hard to change once you've created it.

benterix
1 replies
1d2h

Well, yes and no. If you already have your database running in production with many transactions and live connections 24/7, changes to the schema might not be hard per se but always needs careful planning and execution. Additionally, in Postgres a change like adding a column locks the whole table (although I hear this is going to change) so writes are off for a short time. If this becomes unacceptable, you go for a blue/green deployment which has its own gotchas.

So, while I agree with your main point - there is nothing in Postgres that prevents rapid prototyping, and I would chose it over Mongo any moment, I understand why some people might prefer the more "dirty" approach.

wlll
0 replies
1d2h

True enough, and the solutions for lighter-weight schema changes have evolved and weren't always so good, but early stage startups often don't really have the sort of data-weight issues that make this hard, unless they're starting with large data sets already. Even at the scale of small hundreds of millions of rows per table (like the company I'm in charge of the database for) it's not much of an issue.

I had to look it up, apparently adding a column with a non-null default was "fixed" in PG 11 (2018), but with a null default it had been fast for a while:

https://www.depesz.com/2018/04/04/waiting-for-postgresql-11-...

JJMcJ
1 replies
1d5h

MongoDB is a document database.

Thank you. A one sentence clarification of the issues in Mongo vs RDBMS.

benterix
0 replies
1d4h

It's a bit more nuanced than that. PostgreSQL supports JSON/JSONB data types so if you need, you can use it in a way similar to MongoDB, with less structure and the remaining caveats, but with transactions and all other goodies working out of the box and being able to use standard RDBMS features if needed.

unclebucknasty
0 replies
1d1h

To be fair, it burst onto the scene back in the "all you need is NoSQL...RDBMS is dead" era.

So it became one of those software dogma things and I remember having "debates" right here on HN, including around the necessity of transactions.

So it's kind of ironic to see people now chastising those who fell prey to the dogma. Full circle.

throw_m239339
0 replies
1d3h

MongoDB is a document database.

So can PostgreSQL be with JSON types and queries.

teaearlgraycold
0 replies
1d2h

I like MongoDB for the ease of use when rapidly prototyping things.

I think this is similar to when people say writing tests speed them up through TDD, then others don’t write any tests at all. What I take away is you can do it either way as long as you’re bought in and are familiar with the process.

For me it’s no trouble at all to define my schema in advanced. And Postgres provides me with the json column escape hatch. Combined with the massive benefit of the data validity guarantees later in the app’s lifecycle I pick Postgres every time.

pydry
0 replies
1d6h

The issue isn't that mongodb isn't suitable for some problems. I've used it before and it has not caused issues. It's that the problems it is suitable for are a subset of those that postgres is suitable for.

I can just as easily rapidly prototype in postgres. If I want to store schemaless JSON in postgres I can easily do that.

In the end, I think it is good to have a selection of tools that are good at doing different things.

In the end I think every tool needs a niche - something it is good at that its competing tools are not good at.

Mongo doesnt have that.

jeff-davis
0 replies
1d1h

"It is our responsibility to chose the right tool for the job."

That perspective doesn't work well for database products, in my opinion. There is a huge pressure for databases to evolve with your business and applications and to adapt to whatever your throw at it.

Swapping out a database product is less like changing tools and more like changing a foundation. You can't do it every time a new problem arises.

That's not to say you can't use a few different products if that makes sense. But that has its complications.

p-e-w
25 replies
1d13h

At some point during the past 10 years or so, the world realized that document databases are actually a bad idea, and relational databases can do the same job better with just a few easy-to-implement QoL improvements (such as JSON operators in SQL).

Meanwhile MongoDB's creators thought this was a great time to make Mongo non-free software, presumably with the goal of making a quick buck from cloud operators.

The result is pretty much the landscape we are seeing today, with Postgres reigning supreme, and even SQLite being experimented with in server setups, while Mongo is on life support and MySQL not far behind.

vmatsiiako
18 replies
1d13h

I wouldn't say "Mongo is on life support" – it's actually a very successful business growing 30% YoY on a massive scale. Yet, I agree that the license switch has definitely damaged their long-term ecosystem.

smt88
17 replies
1d13h

They grew 30% if you look at revenue, but that's not how people determine whether a business is healthy.

EBITDA is a better metric, and that number was getting dramatically worse every year until 2023.

But even in 2023, EBITDA was -$202M. I don't see it ever approaching $0.

So it's still a shit business on top of a shit product that no one ever really needed.

phonon
3 replies
1d10h

Their "Comprehensive Losses" for the last 3 years are (in thousands)

(171,150)

(343,375)

(309,090)

So there you go...

librasteve
1 replies
1d9h

they lost 300 million in one year?

christkv
0 replies
1d6h

Stock compensation cash flow was positive meaning they added cash to their account.

jeltz
0 replies
1d7h

Are they literally setting money on fire?

roenxi
1 replies
1d10h

You'd probably save time if you directly suggest what income metric you want to use. In the video, Buffet is basically saying he cares about income after interest, tax, depreciation and amortisation. In other words, that he cares about actual income. That'd be very on-brand for Buffet.

That measure won't be good for MongoDB Inc. either; they've been millions in the red for years. Losing $100 million in a year would be the best result they've achieved in a while now.

ulucs
0 replies
1d7h

so by Buffet's metric, Mongo is doing even worse

blitzar
1 replies
1d9h

People who use EBITDA are either trying to con you or they're conning themselves.

aerhardt
0 replies
1d7h

Probably not used in a conning context when it’s 200-300 million in the red.

jpgvm
0 replies
1d10h

Buffets point is that it can be gamed to look good. In this case even if they are gaming it to look better than it is.... well it's still awful.

airstrike
0 replies
1d6h

Buffett's criticism is that EBITDA is just accounting and he cares about cash flow, specifically also after paying for CAPEX (which is Investing Cash Flow so comes after Operating Cash Flow)

EBITDA is used in the industry because it is a proxy for operating cash flow. Sometimes you don't have all the available data needed to get to OCF, or you're looking at company guidance (for future EBITDA values) or analyst estimates. It's easier to keep the conversation at the EBITDA level because it requires fewer assumptions. Generally the revenue line is ~easy to estimate because you can conceptualize how to go from the current number of customers to some future number of customers, how many dollars per customers and so on and so forth.

Then as you work your way to EBIT (Operating Income) you still have to assume some gross margin, R&D expense, etc. These are pretty tangible. It should be pretty easy to get to estimated EBIT from what the company discloses in guidance or what analysts forecast. Since D&A is pretty linear over time, people generally assume it just remains constant as a % of revenue, so now you have EBITDA which is very much like cash flow

EBITDA is similar to cash flow because it adds back to EBIT the non-cash expense that is D&A. The reason it's good to look at it before interest and taxes is because you're also thinking about how much cash the whole enterprise generates, not how much cash goes to equity holders at the end (which is often called "Free" Cash Flow because it's not tied up with commitments to others)

Coming back to Buffett, in the industries he tends to pick stocks from, CAPEX is a major thing. Companies need to build factories, buy equipment, etc. So if you just look at future EBITDA without accounting for future CAPEX needs, you're fooling yourself.

Truth be told, in those industries everyone also looks at "EBITDA minus Capex". Maybe they do so now that he's bemoaned everyone for not doing it in the first place, but IMHO his criticism largely doesn't apply among valuation professionals. Maybe it does for stock traders, but not for valuation purposes like in an M&A context

otabdeveloper4
1 replies
1d11h

MongoDB is great if what you need is a replicated log or queue. It's what Kafka should have been.

smt88
0 replies
14h34m

As others have said, you can just use your RDBMS for that now. Mongo is a feature of better products.

dangtony98
1 replies
1d5h

I'm actually very bullish on MongoDB.

While the article and significant sentiment in threads suggest a push off MongoDB toward PostgreSQL, I do think MongoDB has its own place in the stack and that it won't be obsoleted. I've personally had many pleasant experiences working with it in past projects.

Regarding business metrics, I may have a slight bias coming from the startup world but we often value revenue the most, especially in the earlier stages of a company; we can reduce costs and perform various optimizations in the future but what stays at the foundation is whether or not there is strong product market fit such that more customers keep coming in and coming back to use the underlying product — I would say MongoDB's current growth trajectory is in line with that.

I'm hopeful overall that they can turn profitable (maybe not this year or next but eventually so).

smt88
0 replies
14h35m

HN should be full of Mongo's users and advocates, and yet this site loathes it. That by itself is a sign that it's a zombie.

The reason we don't just look at revenue is that you can "buy" revenue with ads and sales teams. But if that revenue growth costs more than it pays because your customers aren't sticky, then your business doesn't work. You can't apply startup principles to this large, aging public company.

rrr_oh_man
0 replies
1d9h

I was totally expecting someone to come up with some vanity metric here, but a negative EBITDA of several HUNDRED MILLION? Jesus. How did that happen. B2B business should be a symphony of ka-chings for them.

Joel_Mckay
0 replies
1d10h

I think anyone who administers mongodb would agree it has a list of very dangerous behaviors for novices.

Try to upgrade a running instance node between some versions, and one may thrash object keys or worse. Best to dump and re-load your data every major upgrade to recover all set properties... and there still may be slight differences in the query language later. Wait, your data sets are 1.5TiB per node.... that is a lot of downtime...

The cursor, projection, and re-format queries can be very good at reducing reporting traffic. However, the 3 page long syntax can be unreadable to the uninitiated. The json essentially jams a bunch of equivalent SQL queries in 1 transaction, but you still have to be careful as there is no real guarantee of ordering without auto-indices in clusters.

mongodb is not as clean as SQL, but works for storing constantly changing meta data. i.e. if you find your SQL schema ends up with tuples that have a lot of poorly designed null entries, table edits, and or simply implements an object storage class abstraction in massive meta-data and object-class catch-all tables.

And yes, the mongodb new licensing model telegraphs an unfavorable policy posture. However, I do understand not wanting fortune 500 companies exploiting your teams time for free.

Profit is not an indication of quality, but rather fiscal utility. ;-)

tgv
3 replies
1d9h

Document databases aren't an inherently bad idea. They have their uses. I'm using mongodb because it fits my use case: the app is very front-end oriented, and the data the user needs on a specific page fits neatly in one document. There are (quite a lot) related documents, but they are only needed in specific parts of the app, and that's when those are retrieved. Pretty simple.

I could argue that in-server SQLite is a bad idea: if you ever need to share the database or distribute tasks, you're fucked. But for some use cases it just works.

"The world" hasn't realized shit. It jumps from fad to fad, fueled by hyped promises and lack of experience.

eddd-ddde
2 replies
1d4h

What about a Postgres database where you store your documents in a key value table of json objects?

What mongodb benefits would you be missing?

tgv
1 replies
1d3h

What benefit does switching to postgresql bring? When mongodb stops being maintained, I'll consider postgres, but until then, I now only see downsides.

eddd-ddde
0 replies
1d1h

Lot's of useful features?

You can join data when needed, proper transactions, schemas where you need schemas.

For example you start with a basic (key, json) table. Once you begin to stabilise your schema, add columns generated from your json objects, now you can get proper joins, indexes, validations, on said columns.

MrBuddyCasino
1 replies
1d10h

I‘m not sure that is the case. What I‘m seeing is that people use the cloud-native document stores such as DynamoDB instead of Mongo. The only thing zoomers hate more than traditional SQL databases is XML.

cwbriscoe
0 replies
1d2h

Doesn't everybody hate XML?

paulmd
21 replies
1d11h

In this world you either evolve to Elastic/Cassandra/etc or return to Postgres :crab:

You really gotta search your soul and ask yourself: your cool idea really a database, or just a plugin for Postgres? And the answer is often the latter.

winrid
20 replies
1d10h

The problem is that postgres is not a replacement. The JSON operations are not atomic which is terrible. If you wanna use PG that's cool, but I'd suggest just avoiding JSON.

radiospiel
14 replies
1d10h

The JSON operations are not atomic

I hear this today the first time. What exactly os not atomic, and is there a resource with more details?

adastral
13 replies
1d9h

Not sure if this is what the above comment means by "atomic", but a shortcoming of Postgres' JSON support is that it will have to rewrite an entire JSON object every time a part of it gets updated, no matter how many keys the update really affected. E.g. if I update an integer in a 100MB JSON object, Postgres will write ~100MB (plus WAL, TOAST overhead, etc.), not just a few bytes. I imagine this can be a no-go for certain use cases.

winrid
5 replies
1d2h

It's not an issue with size. It's an issue with race conditions. With Mongo I can update a.b and a.c concurrently from different nodes and both writes will set the right values.

You can't do that with PG JSONB unless you lock the row for reading...

callalex
4 replies
1d1h

Yes but that simplified write complexity means you are pushing a ton of read complexity out to your application.

winrid
3 replies
13h57m

What?? That's an insane argument. That's like saying if one client sets column X to 1 and another client concurrently sets SET y = 2, one client's writes will be LOST. It shouldn't, and it doesn't. If it did, nobody would use Postgres. This issue only exists with PG's JSON impl.

callalex
2 replies
13h51m

What?? That’s an insane way to describe what I’m talking about. Data/transaction isolation is very complex and extremely specific to every use case, which is why database engines worth anything let you describe to them what your needs are. Hence why when one client writes to Y they specify what they think X should be if relevant and get notified to try again if the assumptions are wrong. An advantage of specifying your data and transaction model up front is that it will surface these subtle issues to you before they destructively lose important information in an unrecoverable manner.

https://en.wikipedia.org/wiki/Isolation_(database_systems)

winrid
0 replies
10h17m

So updating one column on a table is destructive and can lose important information now? :D

How do you increment a value in a column while other people write to the database? You don't grab the whole damn row and rewrite it...

winrid
0 replies
10h14m

Also this statement:

which is why database engines worth anything let you describe to them what your needs are

Contradicts your argument.

Mongo has atomic updates to update specific fields, or you can do replaceOne() etc to replace the whole document.

While PG only gives you "replace" with JSON.

So I guess postgres isn't worth anything. :)

bandrami
5 replies
1d9h

It drives me batty to see people store 100MB JSON objects with a predictable internal structure as single records in an RDB rather than destructuring it and storing the fields as a single record. Like, yes, you can design it the worst possible way like that, but why? But I see it all the time.

Temporary_31337
4 replies
1d9h

Because schemas. The whole point of nosql is that you can alter your data model without having to reload the whole database

bandrami
2 replies
1d9h

Which is great in the early stages of development, but people actually deploy like this

aerhardt
1 replies
1d7h

The benefits of going schemaless in the early stages of development are highly suspect in my experience. The time that one might save in data modeling and migrations comes out from the other end with shittier code that’s harder to reason about.

blksv
0 replies
1d5h

My perspective is that using NoSQL does not save time in data modeling and migrations. Moreover, one has to pay in increased time for these activities, because (a) in most cases, data has to follow some model in order to be processable anyway, the question is whether we formally document and enforce it at a relational storage, or leave it to external means (which we have to implement) to benefit from some specifically-optimized non-relational storage, (b) NoSQL DBs return data (almost) as stored, one cannot rearrange results as freely as with SQL queries, not even close, thus much more careful design is required (effectively, one has to design not only schema but also the appropriate denormalization of it), (c) migrations are manual and painful, so one had better arrive at the right design at once rather than iterate on it.

That is, of course, if one doesn't want to deal with piles of shitty code and even more shitty data.

blksv
0 replies
1d6h

Actually, that's the whole point of RDBs: that you can alter your data model (in most cases) just by a simple DDL+DML query. And it is with NoSQL that you have to manually download all the affected data from the DB, run the transformation with consistency checks, and upload it back. Or, alternatively, you have to write your business logic so that it can work with/transform on-demand all the different versions of data objects, which to my taste is even more of a nightmarish scenario.

jeltz
0 replies
1d7h

A JSON object which is 100 MB after compression is a quite huge thing.

mulmboy
4 replies
1d10h

How are JSON operations not atomic? Genuinely curious

bandrami
2 replies
1d9h

I think the sense is not "atomic by field" or whatever you'd call that. If you're going to ignore the fact that PG is an actual ORDB and just store gigantic blobs of JSON in it, it will write the entire object whenever you update part of it, because the whole point is you're supposed to store it as a multi-field record.

SonOfLilit
1 replies
1d8h

Isn't that true for Mongo as well?

rb12345
0 replies
1d6h

Yes, but as MongoDB is a document database, storing and updating giant blobs of JSON as a single operation as opposed to breaking the JSON down into individual fields is intended behaviour. This works in Postgres too, of course, but then you lose the relational database advantages on top of the large-single-field issues.

All this really comes down to is picking the right database type for the problem you're trying to solve.

iambateman
18 replies
1d7h

I use MySQL for everything - always have.

Can someone hit me with a few reasons why you would use Postgres over MySQL? I don’t have any familial affinity to any database, but I’m not sure what the benefits to Postgres are relative to MySQL.

phamilton
2 replies
1d5h

Some things that are nice in Postgres and missing in MySQL:

- Create constraints as NOT VALID and later VALIDATE them. This allows you to create them without expensive locks.

- explain (analyze, buffers). I miss this so much.

- Row level security.

- TOAST simplicity for variable text fields. MySQL has so many caveats around row size and what features are allowed and when. Postgres just simplifies it all.

- Rich extension ecosystem. Whether it's full text search or vector data, extensions are pretty simple to use (even in managed environments, a wide range of extensions are available).

Is that (and more) enough for me to migrate a large MySQL to postgres? No. But I would bias towards postgres for new projects.

dvnguyen
1 replies
1d

How about HA and horizontal scaling? I’ve heard that MySQL excels in that area.

phamilton
0 replies
18h7m

I mostly have used AWS Aurora there, which is significantly better than vanilla MySQL or Postgres and both are similar enough.

In Aurora, Postgres has Aurora Limitless (in preview) which looks pretty fantastic.

As far as running yourself, Postgres actually has some advantages.

Supporting both streaming replication and logical replication is nice. Streaming replication makes large DDL have much less impact on replica lag than logical replication. As an example, if building a large index takes 10 minutes then you will see a 10 minute lag with logical replication since it has to run the same index build job on the replica once finished on the primary. Whereas streaming replication will replicate as the index is built.

Postgres 16 added bidirectional logical replication, which allows very simple multi-writer configurations. Expect more improvements here in the future.

The gap really has closed pretty dramatically between MySQL and Postgres in the past 5 years or so.

wlll
1 replies
1d6h

I do scaling and performance work, mostly with Rails apps, but a significant amount of the work is database level and not language specific. I've used both postgres and MySQL (and a few other databases) going back to 2000.

The best thing I can hear from a company when I start is "We use Postgres". If they're using postgres then I know there's likely a far smoother path to performance than with MySQL. It has better tooling, better features, better metadata.

Pengtuzi
0 replies
1d

Two paragraphs about yourself and ending with a very vague sentence that “answers” their question. 10/10

p4ul
0 replies
1d4h

This is exactly what I would argue. PostgreSQL makes it straightforward to create extremely powerful extensions.

PostGIS is one such extension, and I would argue that if your use case involves geospatial data, then PostGIS alone is enough of a reason to use PostgreSQL!

megaman821
1 replies
1d

I use MySQL mostly, but I would love to have a few features from the Postgres world; namely, the better full-text search, key value store, queue and vector search. A lot of projects I have never reach the scale where I need these to be separate data products so the perfectly fine Postgres versions would suffice.

baur
0 replies
23h53m

CrateDB might be a good fit for full text and vector search (it’s SQL database but has dedicated clauses for FT and VS).

Curious how do you use PG for key/value and queue - do you use regular tables or some specific extensions?

I can imagine kv being a table with primary key on “key” and for queue a table with generated timestamp, indexed by this column and peek/add utilising that index.

dpcx
1 replies
1d5h

I'll give you the one that matters to me: in MySQL, you can't do DDL statements (create table, alter table, etc) inside a transaction. MySQL will implicitly commit after each DDL statement.

kstrauser
0 replies
1d3h

Aww, yes. It’s so nice being able to wrap giant schema and data migrations in a transaction so the whole process is atomic.

teunispeters
0 replies
23h13m

Stability, reliability, consistency. If you do anything with Unicode you're also much better in PostgreSQL. Faster indexes, smaller base install (), and much more complete SQL language support.

(

) note: when PHP was taking off, MySQL had a smaller install base. This has long since changed - PostgreSQL hasn't grown much over the years, and MySQL has, at least since the last time I worked on both circa 2015-ish.

kqr
0 replies
1d7h

Back when I selected which database to get more fluent in one of the concerns I had about MySQL is that it was decribed as playing loose and fast with type casts, at least compared to Postgres.

gkbrk
0 replies
1d7h

Not worrying about Oracle suing you is a good reason to use PostgreSQL or MariaDB.

eknkc
0 replies
1d5h

Last time I used MySQL I had some delete triggers to clean up some stuff.

Apparently MySQL does not run delete triggers in case the rows are deleted due to a foreign key cascade.

Eveytime I used its slightly advanced features, I ran into such problems. With PostgreSQL I do not need to think if this would work.

callalex
0 replies
1d1h

I know this isn’t a technical reason, but my main reason is “eww, gross, it smells like Oracle here.” I’ve been around long enough to know that even being in the same zip code as Oracle is a bad idea.

anticodon
0 replies
21h49m

PostgreSQL supports more SQL features and data types out of the box. Also, it looks like MySQL development has stalled after purchasing by Oracle. PostgreSQL has exciting new features in every release, I forgot when anything significant happened in the MySQL world. It's frozen for like a decade now. There're some new releases, but you won't find anything exciting in the change log.

andybak
0 replies
1d6h

PostGIS was way ahead of the MySQL equivalent last time I checked.

The plugin ecosystem is pretty astonishing. Foreign Data Wrappers... I'm not hands on so much any more but there were a lot of things back when I was.

cpursley
13 replies
1d7h

The one unsolicited piece of tech advice I always give is “Just use Postgres”.

Postgres Is Enough: https://gist.github.com/cpursley/c8fb81fe8a7e5df038158bdfe0f...

There’s some Mongo/json alternatives in the list if you really need unstructured data.

And there’s a huge plug-in ecosystem as well for just about anything you could imagine.

kqr
12 replies
1d7h

But isn't there a point before which "just use sqlite" is more appropriate advice?

madsbuch
4 replies
1d5h

I am curious on this. Why do you think SQLite is simpler than Postgres? and in what setting is that the case?

Sqlite comes with the headache of managing an attached volume.

J_Shelby_J
3 replies
1d3h

Building an mvp or a single container app.

I’m actually trying to get Postgres to work like this right now.

SQLite being embedded is simpler to use and more performant in this environment as there is no IPC overhead.

madsbuch
2 replies
1d1h

Why even use a database? Just assign a variable. All state is dead the second you destroy the container anyways.

jimbokun
1 replies
1d

Well then single container plus one attached volume.

madsbuch
0 replies
22h57m

Yep, my point was that this roughly has the complexity of setting up eg. Supabase.

In conclusion: I would always advice for Postgres over the approach of setting up a volume with sqlite.

The second you need to move on, you need to get that pesky file out of that volume. With Postgres, you just connect to the database from any psql compliant client.

__s
2 replies
1d5h

sqlite is great for what it is

But I recently ended up using sqlite on a site, & it's clear it makes a lot of trade offs to keep itself minimalist. I wish I had just used postgres

That minimalism can be appropriate if you want something that works with a single file to track state & no external process

tristan957
1 replies
1d2h

Have you tried using something like Turso? Curious how that experience would be for you.

(I do not work at Turso)

__s
0 replies
1d1h

Looking over it, I don't see why you'd use a managed sqlite service. Unless you started with sqlite & now you're stuck with its quirks

Don't know, I lean pretty minimalist on things, like I'm using `hyper` directly on another website I work on, rather than picking axum or warp

lysecret
0 replies
1d5h

I love sqlite as much as the next guy but as soon as you are working with multiple write processes (which you quickly will) you will have to move off. So, why not just start with Postgres?

jimbokun
0 replies
1d

SQLite is a replacement for structured data stored in a file. If your data store is a big XML or JSON or CSV, SQLite may be a better solution. For example, I’m using it for ETL tasks and it’s working well for this use case.

If you need a networked database supporting many users, probably better off with Postgres.

cpursley
0 replies
1d7h

Sure, if you simply need a dumb store just for storing basic types that’s simple to operate. But SQLite is not at feature parity with Postgres.

WuxiFingerHold
0 replies
13h0m

For server applications I don't think so (despite the recent hype driven by businesses marketing the cloud services). Managing multiple writes and durability needs the same or even more effort and attention than running PG or MySQL. And on top of it SQLite is missing many often crucial features. For me, those are the type system and functions.

bit_flipper
13 replies
1d

I've run Postgres at large scale (dozens of machines) at multiple companies. I've also run MongoDB at large scale at multiple companies. I like both generally. I don't really care about data modelling differences - you can build the same applications with approximately the same schema with both if you know what you're doing.

I don't understand how folks seemingly ignore Postgres' non-existent out of the box HA and horizontal scaling support. For small scale projects that don't care about these things, fair enough! But in my experience every Postgres installation is a snowflake with cobbled together extensions, other third party software, and home-rolled scripts to make up for this gap. These third party pieces of software are often buggy, half-maintained, and under-documented. This is exacerbated by Postgres' major version file format changes making upgrades extremely painful.

As far as I can tell, there is no interest in making these features work well in Postgres core because all of the contributors' companies make their money selling solutions for HA/sharding. This is an area where MySQL is so significantly better than Postgres (because so many large Internet companies use MySQL) that it surprises me people aren't more unhappy with the state of things. I don't really want to run another Postgres cluster myself again. For a single node thing where I don't care about HA/scaling I do quite like it, though.

amluto
3 replies
23h33m

This is an area where MySQL is so significantly better than Postgres (because so many large Internet companies use MySQL) that it surprises me people aren't more unhappy with the state of things.

I’m not sure precisely what you mean by “HA”, but, in my experience, out-of-the-box support for the most basic replication setup in MySQL is pretty bad. Just to rattle off a few examples:

Adding a replica involves using mysqldump, which is, to put it charitably, not a very good program. And the tools that consume its output are even worse!

There is nothing that shops with MySQL that can help verify that a replica is in sync with its primary.

Want to use GTID (which is the recommended mode and is more or less mandatory for a reasonable HA setup)? Prepare for poor docs. Also prepare for the complete inability of anyone’s managed offering to sync to an existing replica set via mysqldump’s output. RDS will reject the output due to rather fundamental permission issue, and the recommended (documented!) workaround is simply incorrect. It’s not clear that RDS can do it right. At least Azure sort of documents that one can manually real and modify the mysqldump output and then issue a manual API call (involving the directives that you manually removed from the dump) to set the GTID state.

Want point-in-time recovery? While the replication protocol supports it, there is no first-party tooling. Even just archiving the replication logs is barely supported. Postgres makes it a bit awkward, but at least the mechanisms are supported out of the box.

But maybe the new-ish cluster support actually works well one it’s set up as long as you don’t try to add managed RDS-style nodes?

evanelias
2 replies
23h13m

Adding a replica involves using mysqldump

That's one path, but it is not the only way, and never has been.

MySQL 8.0.17 (released nearly 5 years ago!) added support for physical (binary) copy using the CLONE plugin. And MySQL Shell added logical dump/reload capabilities in 8.0.21, nearly 4 years ago.

Third-party solutions for both physical and logical copy have long been available, e.g. xtrabackup and mydumper, respectively.

And there was always the "shut down the server and copy the files" offline approach in a pinch.

amluto
1 replies
21h34m

CLONE is indeed nifty. But why is it a plugin? And who don’t any of the major hosted services support it? (Or do they? The ones I checked don’t document any support.)

I wouldn’t call xtrabackup or mydumper an out-of-the-box solution.

evanelias
0 replies
19h33m

What's wrong with CLONE being a MySQL plugin? I mean a good chunk of this page is people praising Postgres for its plugins.

As for support in hosted cloud providers, that's a question for the cloud providers, no one else can answer this. But my best guess would be because they want you to use their in-house data management offerings, snapshot functionality, etc instead of porting MySQL's solution into the security restrictions of their managed environment.

Yes, xtrabackup and mydumper are third-party tools, as I noted. If you needed something out-of-the-box prior to CLONE, the paid MySQL Enterprise Edition has always included a first-party solution (MySQL Enterprise Backup, often abbreviated as MEB). Meanwhile Community Edition users often gravitated to Percona's xtrabackup instead as a similar FOSS equivalent, despite not being a first-party / out-of-the-box tool.

mixmastamyk
2 replies
20h33m

Citus is open source and well financed. This comment may have made sense a few years ago, but no longer.

evanelias
1 replies
18h32m

By "well financed" you mean "owned by Microsoft"?

That situation raises a separate set of concerns, especially in the context of Microsoft's main database cash cow being SQL Server, not Postgres/Citus.

mixmastamyk
0 replies
15h57m

I'm as skeptical of MS as anyone. However it is licensed GNU AGPL, so not particularly worried.

BiteCode_dev
1 replies
22h44m

Because vertical scaling can take you so far these days that 99% of companies will never, ever reach the scale where they need more. There is just few incentives.

Especially since:

- Servers will keep getting better and cheaper with time.

- Data is not only in postgres, you probably have redis, clickhouse and others, so the charge is balanced. In fact you may have different dedicated postgres, like one for GIS tasks.

- Those hacky extensions are damn amazing. No product in the world is that versatile.

- Posgres has much better support from legacy frameworks like django/ror/laravel than nosql alternatives. People shits on ORM, but they enable a huge plugin well integrated ecosystem that makes you super productive, and PG is happily and transparently handling all that.

- If by some miracle you actually reach the point you need this, you'll have plenty of money to pay for commercial HA/sharding, or migrate. So why think about it now?

riku_iki
0 replies
20h15m

vertical scaling can take you so far these days that 99% of companies will never, ever reach the scale where they need more

its less about the scale and more about HA and service interruption: your service will be down if server dies.

switch007
0 replies
23h59m

I guess some people really, really dislike Oracle (understandably).

And MariaDB is lagging behind, less and less compatible with MySQL etc leading to various projects dropping support for it - notably Azure. I wouldn't pick it for a new project.

klysm
0 replies
20h42m

I feel like I have read this exact comment before verbatim

jmuguy
0 replies
1d

This depends on what level you consider HA and horizontal scaling to be required. I could make the same argument, based on my personal experience, that postgis ought to be included out of the box. Of course, I'll assume most people don't need it :)

bognition
0 replies
23h43m

You'll never see true support for horizontal scalability in Postgres because doing so would require a fundamental shift in what Postgres is and the guarantees is provides. Postgres is available and consistent. It cannot truly be partitionable without impacting availability or consistency.

When an application grows to such a scale that you need a partitionable datastore it's not something you can just turn on. If you've been expecting consistency and availability, there will be parts of your application that will break when those guarantees are changed.

When you hit the point that you need horizontally scalable databases you must update the application. This is one of the reasons that NewSQL databases like CockroachDB and Vitess are so popular. They expose themselves as a SQL database but make you deal with the availability/consistency problems on day 1, so as your application scales you dont need to change anything.

Context: I've built applications and managed databases on 10's of thousands of machines for a public saas company.

jpgvm
6 replies
1d8h

I have done this migration twice and rethinkdb to PostgreSQL once. At this point I think document DBs are as good as dead for new projects. They will live for a really long time still but are in contraction and rent seeking mode now. Expect MongoDB licensing and hosting to increase in price and languish in terms of feature development from here on out.

Beefin
3 replies
1d6h

mongodb is a fortune 100 company and blows through expectations every quarter. doc dbs are far from dead.

unmole
0 replies
1d5h

mongodb is a fortune 100 company

Excuse me? MongoDB Inc. posted a mere $1.68 billion in revenue (with a $177 million loss). Coco-Cola posted $45 billion in revenue and ranks at #100. Forget Fortune 100, MongoDb isn't even in the reckoning for Fortune 500.

jpgvm
0 replies
1d5h

I didn't say it wouldn't make a ton of money, there is a huge number of MongoDB instances out there in the wild in Fortune 500 due to the MEAN stack days (just like RoR before it). I fully expect the company to keep blowing away expectations by increasing how much revenue they extract from that existing customer base. That said they -still- aren't profitable, they might be profitable next quarter in fact I expect them to be with the current environment but to only reach profitability now after their glory growth days are behind them seems... well pretty garbage. If anything it reminds of me of GoPro except instead of cheap Chinese clones it's PostgreSQL and JSONB that is coming for their lunch.

forgotmyinfo
0 replies
1d2h

Huh, I didn't realize database performance correlates so strongly with stock price. Are there Prometheus metrics for this?

Jolter
1 replies
1d4h

Maybe you’re just experiencing the trough of disillusionment, from the Gartner curve? I think it’s fair to call document databases a mature technology now, rather than “dead”.

Semionilo
0 replies
1d1h

If your normal DB can handle documents like a doc focused DB but better, than it might no longer be worth it to have it as a category but as a feature

endisneigh
6 replies
1d8h

Take away is this has less to do with technical reasons and most to do with licensing, which is fair.

The other takeaway is the fact that they saw huge gains switching as a result of query optimizations with joins shows that they’re data wasn’t properly modeled for use with a key value store, which probably added to them switching to the right type of store to begin with.

ako
5 replies
1d7h

And that’s the core problem with key value stores, your data needs usually grow beyond key value scenarios. In the beginning it might fit, but then you add more pages with different needs, reporting/dashboards with different needs, APIs with different needs, and ETL processes with different needs. Trying to force everything into key value is short term thinking.

endisneigh
4 replies
1d7h

What you’re describing isn’t an inherent problem with a key value store. Forcing everything to OLTP SQL is also short term thinking. Pick the right tool for the job.

It’s funny you mention growth as the limiting factor for key value stores, if anything that’s the one area where they’re objectively superior. Which is why most search databases and caches are key value stores.

ako
3 replies
1d7h

Not if you use a rdbms to store your key/values. Then you can do arbitrary queries with joins over multiple tables, add views, stored procedures, etc.

endisneigh
2 replies
1d7h

Sure, but that’s not related to what you said prior. Again, right tool for the job.

ako
1 replies
1d4h

Sure. I meant grow in number of ways you use your data, not in volume of data.

endisneigh
0 replies
1d4h

What you meant was understood and my point stands

antfarm
6 replies
1d3h

I remember well how in the early 2000s everybody wanted to get away from their relational systems and NoSQL (Not only SQL) databases were the latest fashion. Looks like we have come full cycle.

Dudelander
3 replies
1d3h

A fad that set back the industry by a decade.

trynumber9
2 replies
1d3h

Not sure on that, it made some of the SQL solutions innovate quite a bit.

forgotmyinfo
1 replies
1d2h

What did relational databases need from all the document DBs du jour? Broken transactions? Lack of ACID guarantees? Devil-may-care schemas?

rsanek
0 replies
1d1h

yep, the last one -- Postgres' support for json / jsonb is way better now.

jpalomaki
1 replies
1d2h

Relational databases have also evolved. With JSON types you can now easily mix relations and documents.

Aloisius
0 replies
1d

Couldn’t you do that with the XML type? I believe that was added to Postgres before MongoDB existed.

akie
4 replies
1d7h

Did anyone here ever see a migration _from_ PostgreSQL/MySQL _to_ MongoDB? I've only ever seen startups pick Mongo, then regret it a few months or years down the line and migrate to a relational database. Did anyone ever see the opposite?

willio58
0 replies
1d1h

No, but I am currently in a startup that mainly uses DyanomoDB (similar to Mongo) with support from Elastic for more search-based queries. We've found DyanmoDB to be great and we don't feel hindered by the lack of relational-ness.

I will say in case anyone doesn't understand how this might be possible, relational querying is possible in document databases, but they essentially require just different ideas to achieve. In Dynamo, you might need to create secondary indices for example.

Personally I love Postgres and would not mind if we had gone that direction but the more I use DynamoDB the less I feel like Postgres is the "only way"

hibikir
0 replies
1d3h

I've seen a couple of successful ones, but there were extremely good reasons: Most of the time we don't need scaling, but when your target Mongo cluster has a couple hundred machines, and your queries really are very simple, NoSQL is not a crazy idea. Same with DynamoDB: Sometimes it really is a very good choice, but you need a use case of the right shape. If you have an extremely good use case for your table to be in 5+ regions, the dataset has billions of rows and you have no hot keys, maybe running your own replication system really is more effort than letting AWS do it all for you.

It's just far more common for some tech lead in small company to imagine that they are going to be bigger than Netflix, than for someone to start conservatively with a RDBMS, and realize that suddenly they have hundreds of millions of users and some kinds of data where the relational bits don't matter.

harry_ord
0 replies
1d4h

I work with the aftermath of going from postgres to couchdb. The migration was done years before I joined. There are still customer records showing postgres ids.

It didn't seem like it made things better since there were issues with dates and lost data. Working with it now isn't fun either.

djbusby
0 replies
1d7h

I watched a team go from a sloppy MySQL that they never tried tuning to Mongo (cause it's web-scale). It was another thing that almost killed the company. Now they are on PostgreSQL.

zachmu
3 replies
1d

PostgreSQL really is eating the database world.

Although in this case, the authors originally chose an architecture that was poorly suited to their data model. They had relational data and put it in a non-relational store. This was obviously always going to cause problems.

SOLAR_FIELDS
1 replies
23h1m

Over the years I’ve learned two things when it comes to picking a database when starting a new project

1 - most data is inherently relational

2 - Postgres is pretty good at basically any problem you throw at it, pretty scalably. It will fall over at bonkers amount of scale in some cases but you probably won’t have bonkers amount of scale

Thus, if you want to pick any new data store that isn’t Postgres for your project, the default is Postgres. You have to specifically convince me that Postgres is not good enough for this specific use case if you want me on board for anything else as a data store.

zachmu
0 replies
22h38m

It's a pretty good default stance, yeah.

We have been trying to convince people to use our new database [1] for several years and it's an uphill battle, because Postgres really is the best choice for most people. They really have to need our unique feature (version control) to even consider it over Postgres, and I don't blame them.

[1] https://github.com/dolthub/dolt

chrisjc
0 replies
1d

In my experience, it's always come down to either using a specific database type for the wrong job, or using a database type the wrong way.

- Treating an OLAP like an OLTP.

- Using a document store for relational data bc managing relations in SQL is tedious and slows development/progress/features.

- Using OLTP for analytics.

- and so on...

And the usual reaction ensues. Conversations begin about migrating to another product while not really paying attention to interaction patterns and the intent/reason for the various workloads.

Of course such migrations only lead to either...

- Place-shifting the issues, or

- Resolving the issues that lead to the migration while creating new ones that didn't exist before. Workloads that were suited to the existing DB are now mismatched to the new one.

However, Postgres seems to be a pretty safe bet when such migrations are undertaken due to how versatile it is. Personally, I don't care which database is used (within reason of course). Just use the right database type for the job.

winrid
3 replies
1d11h

I don't doubt there are real benefits for them to switch but BTW this is not true:

Difficulty configuring database transactions: With MongoDB, setting up transactions was not trivial because it required running MongoDB in cluster mode with various configuration overhead; this made it extremely difficult, for instance, for customers to run a simple POC of Infisical because it required a production setup of MongoDB.

You can run single instance in clustered mode, just a single-instance replica set. You get the oplog, transactions, etc. No advanced configuration required.

blue_pants
1 replies
1d10h

Are there any disadvantages to using transactions with a single-instance replica set?

ammo1662
0 replies
1d9h

We run a single instance replica set for three years in a small application for its transaction function.

The only difference we noticed is the additional replica set setting for a new instance. No other disadvantages are found.

vasco
0 replies
1d11h

And this has been possible for many years.

bsaul
3 replies
1d10h

is it still difficult to create a cluster of pg dbs. either for redundancy or speed ?

Last time i advocated for using pg vs mongodb, the person replied that mongodb clustering was super easy.

Ozzie_osman
1 replies
1d9h

Read replicas are pretty easy on postgres using replication. That said, you need to be careful about replica lag.

If you want to distribute your writes, that's a little trickier. There are options like Citus and such. But still not natively supported.

KingOfCoders
0 replies
1d8h

Second that. A client hat PG replica lag, was doing backups from the replica and found out during restoring a backup, that some hours of data were missing.

baq
0 replies
1d9h

Mongo absolutely can be the right answer if you need to horizontally scale to tens or hundreds of TBs. Anything below that I’d rather have a small cluster of big Postgres instances. There’s value in a SQL RDBMS which you just don’t get anywhere else.

arusahni
2 replies
1d6h

This seems to gloss over the actual rewrite. How did you ensure queries were equivalent? How did you configure the product to be able to read from and write to both databases? Did migrating reveal any bugs? Were you largely able to port your queries 1:1?

Semionilo
0 replies
1d1h

You can write logging for both drivers dumping results.

Then checking which query doesn't return the same result set.

Cyberdog
0 replies
1d3h

Hopefully they had comprehensive test coverage. That would have helped greatly in that regard.

KingOfCoders
2 replies
1d8h

Some years ago we had so much problems with MongoDB that we also ditched it for Postgres. Never been happier.

Not the reason, but very annoying: Why doesn't (perhaps this has changed?) MongoDB use SQL but Javascript with endless $s? It was always a pain writing ad-hoc queries.

cj
1 replies
1d7h

Why doesn't MongoDB use SQL

Perhaps not the right answer, but SQL literally stands for Structured Query Language. I suppose you'd need an USQL?

KingOfCoders
0 replies
1d5h

Google says:

'The original full name of SQL was SEQUEL, which stood for "Structured English Query Language". A language which has been structured in English or lets say English-Like Query language which follows the syntax structure of English.'

anticristi
0 replies
1d9h

:)))

I find the super-plain voice of the annoyed character further amplifies the humor.

WuxiFingerHold
0 replies
13h5m

Talking to AI 12 years ago :-)

CSMastermind
0 replies
23h47m

Man, that thread feels prophetic. Since 2017 whenever this comment was written:

100% of my friends who have used Mongo/similar NoSQL have given up and had a nasty rewrite back to pgSQL...

I've led three migrations at different companies from Mongo to Postgres, spending months of my life and millions of dollars.

There really should be a sociology study of why so many people/companies used Mongo even well after the downsides were clear.

namaria
1 replies
1d9h

As part of that stack, we chose MongoDB + Mongoose ORM because the combination presented least overhead and allowed us to ship quality features quickly. As Sir Tony Hoare states, “premature optimization is the root of all evil,” and there was certainly no need for further optimization at the time.

I find it interesting that one sentence claims they made an optimal choice for feature delivery speed, and the next one that they rationalized it as "non optimal as prescribed by Hoare". Never mind the fact that three sentences down the original quote Hoare said "It is often a mistake to make a priori judgments about what parts of a program are really critical, since the universal experience of programmers who have been using measurement tools has been that their intuitive guesses fail."

dangtony98
0 replies
1d5h

The nature of the use-case evolved and, at one point, the MongoDB + Mongoose ORM combination was in fact optimal judging from various aspects including tooling familiarity with most to the primary use-case being the managed cloud service. As the situation evolved, it no longer became optimal and was superseded by various stack-wide adjustments.

Regarding tooling familiarity: In the earlier stages of a company, I'd argue it to be important to pick what you can ship with fastest. You don't want to prematurely, for example, setup a complex microservices setup and deployment with K8s for an idea that is in its infancy and that might not even get that much traffic to begin with. Instead, deploying a monolothic app to Heroku might be the optimal choice (YC has a more extreme term for this they call "duct tape engineering"). As the company grows and there is clearer product market fit, it makes sense to further optimize the stack and perform adjustments as needed.

Regarding the use-case changing, it wasn't entirely obvious in the beginning that a lot of users would self-host the product, so it was designed more-so with the cloud product / managed service in mind; in hindsight things seem more obvious but not in the moment. As it became clearer that more and more people were going to self-host this product, the team shipped more features to accommodate for a simpler self-hosting experience and, as part of that, the PostgreSQL migration.

jacobyoder
1 replies
1d

"this hurt in particular because our data was very much relational."

Umm... why would you not choose a relational engine to start with? This isn't said with the benefit of hindsight. I worked with Lotus 123 back in the early 90s. I get that there's value in document databases, but even then, there were limitations, and the need for the ability to have structured/related data in an engine that easily allowed for ad-hoc queries out of the box was apparent.

I watched the entire nosql/mongo movement arise and evolve over time, and it rarely made sense to me. Even when mongo made sense for a problem... it only makes sense for that problem, and not as the primary basis for your entire application. Relational/SQL, with nosql/mongo/etc as secondary/ancillary data store has always made the most sense (imo).

JSON columns in major databases now tend to provide a pragmatic balance of "good balance for many use cases". Justifying something other than that as a starting point is possible, but I've rarely seen cases where it makes sense to avoid a decent SQL engine as the core data store for your business apps.

CSMastermind
0 replies
23h52m

Umm... why would you not choose a relational engine to start with?

There's typically two answers to this question:

When they first started things were changing so quickly they didn't want to commit to a schema and felt like doing real data modeling would slow their velocity down so they dumped everything into Mongo and called it a day.

Or they had someone on the team who insisted that Nosql was the way to go. They'll say something like, "I worked on xyz team at Amazon and we just used DocumentDB for everything and never had a problem. Those horror stories about Mongo are all from people who didn't use it right."

glimshe
1 replies
1d9h

Friends don't let friends use MongoDB. In my previous job, I got a big promotion in part for getting rid of MongoDB and virtually eliminating our database problems.

pydry
0 replies
1d7h

Most of the time I've tried to convince people not to move away from mongo i failed.

My only success was pushing back on switching from postgres to azure cosmos after Microsoft salespeople convinced some of our managers that we were making a mistake by not using their hot new toy.

This industry...

Cyberdog
1 replies
1d2h

What is the point of using a query builder if you’re only going to support a single RDBMS? Why not write straight SQL and avoid the unnecessary abstraction layer? Is the query builder really going to be easier to learn to use (particularly for non-trivial queries) than SQL?

callalex
0 replies
1d1h

If your DB schema is expressed as an object/class it means your IDE will have robust autocompletion. It’s just a convenience to avoid having to look up the schema for whatever tables you’re interacting with. It helps cut down on trivial bugs during the writing phase, before you even get to testing.

zulban
0 replies
1d7h

If you like this kind of thing, I also migrated from mongodb to postgresql and wrote about it here: https://blog.stuartspence.ca/2023-05-goodbye-mongo.html

My post is more technical, with examples and graphs, and less business-y.

rullopat
0 replies
1d4h

That's what happens when you choose the next product / framework / methodology because of fashion or because FAANG is giving you the stuff for free, without thinking if it really fits your use case and that 99.9999% of the companies will not have the problems Google, Facebook, etc. have.

rglover
0 replies
1d9h

Missing out on relational features: With MongoDB, we lost out on many nice features from the relational world like CASCADE which, when specified, deletes all referenced resources across other tables whenever a target resource is deleted; this hurt in particular because our data was very much relational.

I'd be very curious what their data model was in relation to this problem. I wonder if denormalization of the data would have solved the problem without a need for a full database hop.

pmarreck
0 replies
1d4h

Could have told most people this back when everyone was going to NoSQL.

"You'll be back". Relational capability is just too useful beyond a certain point

kcartlidge
0 replies
1d

I've used (and introduced) MongoDB in production. Though I much prefer PostgreSQL, SQLite, MySQL, or SQL Server, for some use cases a document database is fine.

However as I discovered myself, once you realise you need to use Mongoose with it you should usually take that as a prompt to consider going relational.

Don't get me wrong, Mongoose is a good package. But the things it solves could likely be better fixed by moving away from MongoDB - I'd go so far as to say that in most cases Mongoose should only be added to help an existing project; if you need it from the start you probably should go relational.

(YMMV and there will be exceptions.)

jrochkind1
0 replies
1d3h

we felt that it already delivered significant benefits by virtue of not being MongoDB

Well then, tell us how you really feel.

j45
0 replies
1d3h

It always struck me as odd to see the hard work put into NoSQL databases to make them relational.

Somehow, the work to do this seemed less work than learning SQL.

h1fra
0 replies
1d8h

Funnily enough, the one feature I don't recommend in Postgres is Cascade. Unless you have a very small controlled set of FK it's not a good idea. Batch delete is less problematic. I wish Postgres would do something about it, like elasticsearch deleteByQuery that can run in the background.

globular-toast
0 replies
1d8h

I started my career doing web development using PHP and mySQL. Then pivoted to something completely different for about 6-7 years as I got sick of spending all my time making stuff work in IE6. I heard it the whole nosql thing, didn't understand it but thought it sounded silly. Then came back to web dev 5 years ago and found postgres was all the hype and nobody talked about nosql any more. Glad I skipped it!

fl0ki
0 replies
22h18m

I inherited a troubled project using MongoDB and learned a lot about what is, and isn't, really a problem with MongoDB [for this kind of project].

BSON is a terrible format, but I genuinely like that I can use the same schema all the way through the JSON APIs to the database. If they have to deviate in future for some reason, you can deal with it then.

If you are clever with generics, you can compose type-agnostic DB code with type-safe in-memory data structures, without needing any schema generation or ORM. It's a natural fit for data-oriented programming in my workhorses Go and Rust. Unfortunately, the Rust library is very poorly designed in many ways, making many basic things extremely inefficient, and that's not what you want in Rust. The Go library has really slow reflection-based (de)serialization, but other than that, it's very flexible and robust.

If you keep your queries simple, it's easy to mock out and combine with techniques like snapshot testing. Since BSON has a lossless representation in the form of extJSON, you can have human-readable snapshots, diff them easily in review, etc.

Doing bulk writes is easy and efficient, and doing sessions/transactions is easy, but you should basically never combine the two. The server can have its own configured timeout for a session, and if your bulk write exceeds that, it fails and retries, fails again, etc. This is a really serious design flaw that shouldn't have made it past review. If they wanted to enforce a maximum bulk write size, that should be known up-front, not depend on how many seconds it happened to take each time.

Writing data to a replica set is extremely slow, and every index you add makes it slower. This compounds further with the above.

I have concluded that MongoDB only makes sense for read-mostly workloads. If you're going to do your own in-memory data structures and validation anyway, writing out small infrequent changes, MongoDB doesn't do much for you but it also doesn't get in your way very much.

fakedang
0 replies
23h51m

Can somebody please explain to me the use cases in which someone would use Postgres vs MySQL vs something like Cassandra or Convex? I know the latter two have their own types of SQL implementations, so why aren't they used as much, even if they are NoSQL databases?

agentultra
0 replies
1d4h

Hoo boy. I did this once. Only it was CouchDB. And they didn’t use any schema. Data had accumulated in it for years. Then a customer said they wanted their data exported to PowerBI.

I used a library that generated a type that could parse any JSON document you threw at it by feeding it examples. So I wrote some tools to scan through the database, generate parse test cases, etc.

I then wrote a library to version and migrate records.

With this I wrote some tools to start extracting meaningful bits out of the big-blob type. I could scan the database, download examples that didn’t work, make test cases, fix those, ask the experts in the system, etc.

Then eventually it started spitting out structured data into Postgres and from there into PowerBI.

Decent article.

adhamsalama
0 replies
1d7h

I pushed the company I work for to use Postgres instead of Mongo in all new services and refactors. Now I'm leaving it to join a new company, where they use Mongo. Sigh. I'll have to do this all over again.

Semionilo
0 replies
1d

General speaking: don't use mongodb.

It was super shitty when it came out and using postgres gives you a better and faster solution which you can also use in other projects.

I'm always impressed when I see mongodb still making money.

EMM_386
0 replies
1d3h

With MongoDB, we lost out on many nice features from the relational world ...

... this hurt in particular because our data was very much relational.

That is my best attempt at the summary.

The write-up is good, but after you've been around a long (long) time ... you know what this is likely going to involve.

Glad relational SQL is still going strong 50 years later.