HN comments for: Mongo but on Postgres and with strong consistency benefits

marcus_holmes

34 replies

14h15m

2024-07-08 04:15:06 UTC

I tried a similar approach in a previous startup - treat data as documents and store in a JSONB field.

Postgres was awesome and handled this brilliantly, but the lack of schema and typing killed it. We just ended up fighting data quality the whole time. We couldn't assume that any document had all the required fields, or that they were in a format that made sense e.g. the Price column sometimes had currency symbols, and sometimes commas-and-periods in UK/US format and sometimes in Euro format - sorting by Price involved some complicated parsing of all the records first.

We moved back to relational tables.

I won't say I'd never do this again, but I would definitely not just throw JSON documents to a database and expect good things to happen.

winternewt

11 replies

10h26m

2024-07-08 08:04:22 UTC

If you have schema requirements, why _not_ do it with tables and columns instead? The point of the jsonb column would be precisely to store arbitrary stuff that has no schema.

bruce343434

8 replies

9h4m

2024-07-08 09:25:58 UTC

I usually see this (dynamic scripting langs, schemaless databases) play out as a very good environment to prototype in, very few barriers to change things or make stuff up as you go. Then the prototype is not discarded but "iterated" to form the actual production code. The (now needlessly so) flexible JSON store is grand-fathered in.

I have yet to come across an actual use case for JSON stores in a production app with an established design. What the hell do you mean you have no idea what the data might hold?? Why are you storing unknown, undefined, undefinable things??? Or perhaps, there actually is a schema i.e. fields we rely on being present, but we were too lazy to define it formally?

tracker1

0 replies

3h47m

2024-07-08 14:43:28 UTC

Adjacent data that is very alike for most usage but different in subtle ways.

Classified ads, the additional details for a Car are different than a Shirt, but both would be ads. And adding a nearly infinite number of fields or a flexible system in a set of schema or detail tables is significantly worse than unstructured JSON.

Another would be records from different, but related systems. Such as the transaction details for an order payment. Paypal data will be different from your CC processor, or debit transaction, but you can just store it as additional details for a given "payment" record.

Another still would be in healthcare, or any number of other systems where the structures will be different from one system to another depending on data interchange, where in the storage you don't care too much, only in the application layer will it make any necessary difference.

okr

0 replies

6h59m

2024-07-08 11:30:50 UTC

Customers sent custom things.

mexicocitinluez

0 replies

5h36m

2024-07-08 12:54:25 UTC

I have yet to come across an actual use case for JSON stores in a production app with an established design

Healthcare. Clinical data is unstructured and constantly changing. Would you build out a table with 2000 columns that changed yearly? What about 5000?

goosejuice

0 replies

3h30m

2024-07-08 15:00:39 UTC

Data owned by another service/application/org that is written and read without modification.

For example, a healthcare document that wasn't built or technically owned by the application storing it.

For example, a web text editor that serializes it's state as json.

Not json, but a web scraper storing html documents.

These have structure, it's only that the structure is built/maintained outside of the application storing it. You could of course transform it but I think it's a bit obvious where that might not be worth the cost/complexity.

eropple

0 replies

4h53m

2024-07-08 13:36:43 UTC

Well, there are a decent number of those use cases, but the one I find most compelling is storing an exact record of what another service sent me in case I need it later. I pull out and store in my typed schema the relevant bits of, say, an OpenID Connect response, but I'll often store the entire response as well in case I want to add functionality in the future based on additional fields, etc. in that response and roll forward without having to have an intermediate state.

cryptonector

0 replies

1h52m

2024-07-08 16:38:33 UTC

You can always change the source of truth so instead of being the JSON texts it's the properly normalized schema and then you can have a VIEW that returns JSON for apps that need JSON. And if need be you can denormalize a bit and have a JSONB column that stores the document as the serialization of the rows' other columns.

That's what's really nice about SQL: you can perform these sorts of schema surgeries and still retain backwards-compatibility using VIEWs.

cruffle_duffle

0 replies

1h15m

2024-07-08 17:14:49 UTC

A use case for me is I’m calling a lot of different 3rd party APIs that return similar things but in widely different data structures with a lot of information specific to that site. I take what is standard across all the API’s and store that in my relational database but anything I didn’t use I store in a JSONB field just in case I need it some time in the future.

In other words I’m not really using anything in that JSONB field… at least right now.

SSLy

0 replies

6h17m

2024-07-08 12:13:24 UTC

We're evaluating solutions to ingest business app events and infrastructure logs. Document-ish JSON-ish "DB"/search engine seems like best solution, because no one does structured logs properly.

Now we have to pick the stack. ELK, Loki + Graphana, Graylog or maybe just dump into MongoDB?

marcus_holmes

1 replies

8h49m

2024-07-08 09:40:43 UTC

I think my learning was "all data has schema, eventually"

We thought it was just a bunch of data documents. But it turned out that to actually use that data in an application it had to have predictability, we had to know certain fields were present and in a fixed format. You know: a schema.

mexicocitinluez

0 replies

5h31m

2024-07-08 12:59:39 UTC

I'm confused about this. Since when can't you enforce a schema on a document you're storing? Did you have no way to change the data before it hit the database?

NoSql stores still have schemas.

goosejuice

10 replies

12h20m

2024-07-08 06:09:54 UTC

Sounds like most/all of these issues would be solved by validation at the app layer.

ivandenysov

5 replies

11h55m

2024-07-08 06:34:51 UTC

True, but for example with db-level validations I don’t have to worry that a new developer ships a batch job that runs raw inserts for performance reasons and bypasses app layer validations.

goosejuice

2 replies

3h49m

2024-07-08 14:41:37 UTC

I'm not advocating for no database level validations. One can and should have both in my opinion.

Postgres supports constraints on jsonb as well.

I would generally advocate for no write access outside of the app as well. Certainly for new developers. Get some tests on that batch job.

corytheboyd

1 replies

3h26m

2024-07-08 15:03:48 UTC

I would generally advocate for no write access outside of the app as well.

FWIW I think OP was referring to app code still, but code opting in to skipping app data validations. Rails for example makes this very easy to do, and there are tons and tons of people out there selling this as a performance improvement (this one trick speeds up bulk operations!!!!). There are times where it’s useful, but it’s a very sharp knife that people can easily misuse.

So yeah, anyway, have both db constraints and app validations.

goosejuice

0 replies

2h53m

2024-07-08 15:37:10 UTC

Thanks for pointing that out, I did misread that and have seen/written such things in rails land. We used active record import which gives the option of running validations.

But yeah, layering your safety nets is generally wise. I would include testing and code review in that as well.

zepolen

1 replies

5h16m

2024-07-08 13:14:35 UTC

What? You should be exposing your data access layer as a microservice and all your db level validations should be done there...

You can even write a small DSL syntax language to make it easier to use for developers, and perhaps an Obviously Reduntant Middleware that sits between them to convert their programming language objects to the DSL. Add some batch support, perhaps transactional locks (using mongo, we want to be webscale after all) and perhaps a small terminal based client and voila, no one should ever need to deal with petty integrity in the db again.

eropple

0 replies

4h50m

2024-07-08 13:39:48 UTC

This post is a fantastic example of Poe's Law. You had me for a second.

pennomi

3 replies

5h22m

2024-07-08 13:08:32 UTC

App layer validation is for the user’s sanity. DB layer validation is for the developer’s sanity.

papichulo2023

1 replies

5h1m

2024-07-08 13:29:38 UTC

Ehhh, most dbs are managed by migrations in the application, so essentially is duplicated logic

eropple

0 replies

4h52m

2024-07-08 13:37:55 UTC

Most databases eventually have a human typing an INSERT statement or the moral equivalent. It's not duplication to have your correctness functions protect that, too.

goosejuice

0 replies

3h43m

2024-07-08 14:47:27 UTC

I like this as a general rule, but I almost always have more refined validation at the app layer -- particularly when using jsonb.

Tool of choice is ecto which has excellent support for maintaining jsonb structure.

jimmyl02

2 replies

14h5m

2024-07-08 04:25:22 UTC

the view I now have is that for a relational table, yes you have to suffer through migrations but at least they are in sql. for document based stores, you still have to have migrations, but they are just implemented in code

json documents sound great, especially initially, but end up being a maintenance nightmare

hans_castorp

0 replies

13h34m

2024-07-08 04:56:04 UTC

for document based stores, you still have to have migrations, but they are just implemented in code

The problem with that - in my experience - is that migrating the structure for thousands (if not millions) of documents is way slower than running a DDL command (as it means reading each document, parsing it, modifying it and writing it back). Many DDL commands are just metadata update to the system catalogs so they are quite fast (e.g. adding a new column with a default value). With documents you wind up with millions of single row updates.

This can be mitigated by doing a "lazy" migration when a document with the old structure is first read. But that makes the code much more complicated.

globular-toast

0 replies

11h2m

2024-07-08 07:28:28 UTC

Or, to put it another way, yes you have to write and maintain an upfront schema, but a document-based system has a schema too, it's just distributed amongst 50 codebase and 10 people's heads.

jamil7

2 replies

13h0m

2024-07-08 05:30:15 UTC

I don’t do much backend work at the moment but I found using JSONB fields for prototyping endpoints to be quite handy. Once I had a feature figured out and working, I’d create a proper schema.

marcus_holmes

1 replies

8h45m

2024-07-08 09:45:07 UTC

I've found Postgres' Hstore type useful for this, too.

I often create a "metadata" hstore field on a table and use it for random bits of data I don't want to create an actual field for yet. When I find that the application needs that bit of data and in a certain format, I'll move it into an actual field.

cryptonector

0 replies

1h50m

2024-07-08 16:39:42 UTC

hstore is basically the flat precursor to JSONB, and IMO it's as good as obsolete. I wouldn't use hstore in a new project, just JSONB.

raverbashing

1 replies

9h0m

2024-07-08 09:30:18 UTC

the Price column sometimes had currency symbols, and sometimes commas-and-periods in UK/US format and sometimes in Euro format

Well this is not a relational issue is it? It is a data normalization issue

marcus_holmes

0 replies

8h47m

2024-07-08 09:42:46 UTC

Yes, true. But that was a single example off the top of my head. There were lots of others.

Not saying the approach isn't suitable for some use cases. Just that I'd be really careful that this is one of those use cases next time.

tracker1

0 replies

3h51m

2024-07-08 14:39:07 UTC

This is an issue regardless of your type of Document storage. There are different options for verifying document details and versioning when approaching Document oriented databases. A typical usage is to have a per-record version number and pass documents through a verifier either as part of insert, update or query to ensure all migrations to the current version have occurred.

cruffle_duffle

0 replies

1h18m

2024-07-08 17:12:30 UTC

You know I’ve found that LLM’s are awfully good at taking random JSON crap and “converting it” to a consistent format. You don’t even need a large LLM… something like ChatGPT 3.5 or an equivalent class of LLM can do the work just fine.

It’s not perfect and requires a fair amount of effort to nail the prompt but when it works it works.

aden1ne

0 replies

7h44m

2024-07-08 10:46:13 UTC

Had a similar experience. What killed it for me, is that no statistics can be gathered for JSONB columns. This in turn really messes with the query planner once you do something like `select a.* from a join b on a.id = b.a_id where b.my_jsonb_column ->> 'foo' = 'bar';`.

Given the lack of statistics, the query planner loves going for a nested loop rather than hash or merge join where those would appropriate, leading to abysmal performance.

There is an thread[0] on the PostgreSQL mailing list to add at least some statistics on JSONB column, but this has gone nowhere since 2022.

[0]: https://www.postgresql.org/message-id/flat/c9c4bd20-996c-100...

karmakaze

27 replies

1d2h

2024-07-07 15:54:51 UTC

What makes mongo mongo is its distibruted nature, without it you could just store json(b) in an RDBMS.

anonzzzies

8 replies

1d1h

2024-07-07 16:37:10 UTC

So how easy is it to distribute it? I don’t have experience with it but the tutorials look terrible compared to, say, Scylla, Yuga, Cockroach, TiDB etc. Again, honest question?

rad_gruchalski

2 replies

1d1h

2024-07-07 17:03:28 UTC

Pongo seems to be a middleware between your app and Postgres. So it will most certainly work absolutely fine on YugabyteDB, if one’s okay with occasional latency issues.

One could optimise it more for a distributed sql by implementing key partition awareness and connecting directly to a tserver storing the data one’s after.

oskar_dudycz

1 replies

2024-07-07 17:31:19 UTC

Yes, as long as database has support to JSONB and JSON path syntax (so PG 12 >= compliant) you should be good to go :)

rad_gruchalski

0 replies

23h34m

2024-07-07 18:56:39 UTC

It could work: https://docs.yugabyte.com/preview/explore/ysql-language-feat....

theteapot

1 replies

2024-07-07 18:26:45 UTC

Does "distributed" mean sharded or just replicated? In either case it's a bit quirky but easy enough.

Scylla, Yuga, Cockroach, TiDB etc.

You have experience "distributing" all these DBs? That's impressive.

anonzzzies

0 replies

13h52m

2024-07-08 04:37:46 UTC

We evaluated these over the years for different purposes/clients. And I don’t expect the ‘quirky’ part in 2024 I guess; it gets hard of course when the loads get specific; but in general I expect things to be automatic (multi master, sharding and replication) and simple to set up. Point out other nodes and done.

Too

1 replies

12h57m

2024-07-08 05:32:47 UTC

It couldn’t be simpler. Just configure all clients and servers to be aware of each other and go.

Say what you want about the rest of mongo. This is an area where it actually shines.

anonzzzies

0 replies

10h54m

2024-07-08 07:36:14 UTC

Alright, I'll try it. Maybe I had the wrong tutorials or maybe these were old.

tracker1

0 replies

3h24m

2024-07-08 15:05:49 UTC

Relatively easy... though, and I may be out of date, you have to choose replication or sharding at any given level... so if you want horizontal scale plus redundancy, you will wind up with slightly more complexity. My experience trying to recover a broken cluster after 5 hours of Azure going up/down in series was anything but fun.

Would have to test, but the library for this post may well work with CockroachDB if you wanted to go that route instead of straight PostgreSQL. I think conceptually the sharding + replication of other DBs like Scylla/Cassandra and CockroachDB is a bit more elegant and easier to reason with. Just my own take though.

zihotki

5 replies

1d2h

2024-07-07 16:11:28 UTC

But RDBMS'es are often also distributed. So what is mongo now?

marcosdumay

3 replies

1d2h

2024-07-07 16:19:20 UTC

People don't usually distribute Postgres (unless you count read replicas and cold HA replicas). But well, people don't usually distribute MongoDB either, so no difference.

In principle, a cluster of something like Mongo can scale much further than Postgres. In practice, Mongo is full of issues even before you replicate it, and you are better with something that abstracts a set if incoherent Postgres (or sqlite) instances.

zozbot234

1 replies

2024-07-07 18:16:56 UTC

Postgres supports foreign data wrapper (FDW), which is the basic building block for a distributed DB. It doesn't support strong consistency in distributed settings as of yet, although it does provide two-phase commit which could be used for such.

williamdclt

0 replies

23h44m

2024-07-07 18:46:33 UTC

strong consistency in distributed settings

I doubt it ever will. The point of distributing a data store is latency and availability, both of which would go down the drain with distributed strong consistency

hibikir

0 replies

22h53m

2024-07-07 19:37:26 UTC

I think of the Stripe Mongo install, as it was a decade or so ago. It really was sharded quite wide, and relied on all shards having multiple replicas, as to tolerate cycling through them on a regular basis. It worked well enough to run as a source of truth for a financial company, but the database team wasn't small, dedicated to keeping all that machinery working well.

Ultimately anyone doing things at that scale is going to run a small priesthood doing custom things to keep the persistence payer humming, regardless of what the underlying database is. I recall a project abstracting over the Mongo API, as to allow for swapping the storage layer if they ever needed to

brabel

0 replies

1d1h

2024-07-07 16:59:03 UTC

Often?? In my experience it's really hard to do it and still maintain similar performance, which kind of voids any benefit you may be looking for.

rework

3 replies

22h59m

2024-07-07 19:31:32 UTC

What makes mongo mongo is its distibruted nature

Since when? Mongo was popular because it gave the false perception it was insanely fast until people found out it was only fast if you didn't care about your data, and the moment you ensure write happened it ended up being slower than an RDB....

jokethrowaway

2 replies

21h34m

2024-07-07 20:55:50 UTC

Since forever, sharding, distributing postgres / mysql was not easy. There were a few proprietary extensions. Nowadays it's more accessible.

This was typical crap you had to say to pass fang style interview "oh of course I'd use mongo because this use case doesn't have relations and because it's easy to scale", while you know postgres will give you way less problems and allow you to make charts and analytics in 30m when finance comes around.

I made the mistake of picking mongo for my own startup, because of propaganda coming from interviewing materials and I regretted it for the entire duration of the company.

threeseed

1 replies

18h34m

2024-07-07 23:56:10 UTC

Nowadays it's more accessible

Distributing PostgreSQL still requires proprietary extensions.

With the most popular being Citus which is owned by Microsoft and so questions should definitely remain about how long they support that instead of pushing users to Azure.

People like to bash MongoDB but at least they have a built-in, supported and usable HA/Clustering solution. It's ridiculous to not have this in 2024.

foobarkey

0 replies

8h37m

2024-07-08 09:53:05 UTC

Trying to use MongoDB by default for new projects because of the built in HA, dumped Postgres because the HA story is so bad on bare metal (alright if you are ok burning money on RDS or simiar).

Current preference: 1. HA MongoDB 2. HA MariaDB (Galera) or MySQL Cluster 3. Postgres Rube Goldberg Machine HA with Patroni 4. No HA Postgres

lkdfjlkdfjlg

3 replies

1d2h

2024-07-07 16:06:45 UTC

What makes mongo mongo is its distibruted nature, without it you could just store json(b) in an RDBMS.

Welllllllll I think that's moving the goalposts. Being distributed might be a thing _now_ but I still remember when it was marketed as the thing to have if you wanted to store unstructured documents.

Now that Postgres also does that, you're marketing Mongo as having a different unique feature. Moving the goalposts.

thfuran

2 replies

1d2h

2024-07-07 16:26:11 UTC

It doesn't really seem reasonable to accuse someone of moving goalposts that you've just brought into the conversation, especially when they were allegedly set by a third party.

coldtea

1 replies

1d1h

2024-07-07 17:15:16 UTC

Parent didn't "just brought them", they merely referrenced the pre-existing goalposts used to advocate for Mongo and reasons devs adopted it.

lkdfjlkdfjlg

0 replies

21h57m

2024-07-07 20:32:46 UTC

Exactly this, very eloquent, thank you.

Yes, I'm still bitter because I was one of those tricked into it.

richwater

2 replies

1d2h

2024-07-07 15:59:29 UTC

store json(b) in an RDBMS

I actually did this for as small HR application and it worked incredible well.jsonb gin indexes are pretty nice once you get the hang of the syntax.

And then, you also have all the features of Postgres as a freebie.

eddd-ddde

1 replies

1d1h

2024-07-07 16:46:39 UTC

Personally, I much better like postgres json syntax than whatever mongo invented.

Big fan of jsonb columns.

oskar_dudycz

0 replies

22h51m

2024-07-07 19:39:28 UTC

I'm planning to add methods for raw JSON path or, in general, raw SQL syntax to enable such fine-tuning and not need to always use MongoDB API. I agree that for many people, this would be better.

darby_nine

0 replies

1d2h

2024-07-07 15:59:49 UTC

but then you wouldn't have the joy of using the most awkward query language invented by mankind

pipe_connector

21 replies

1d2h

2024-07-07 15:48:06 UTC

MongoDB has supported the equivalent of Postgres' serializable isolation for many years now. I'm not sure what "with strong consistency benefits" means.

zihotki

5 replies

1d2h

2024-07-07 16:08:42 UTC

Or is it? Jepsen reported a number of issues like "read skew, cyclic information flow, duplicate writes, and internal consistency violations. Weak defaults meant that transactions could lose writes and allow dirty reads, even downgrading requested safety levels at the database and collection level. Moreover, the snapshot read concern did not guarantee snapshot unless paired with write concern majority—even for read-only transactions."

That report (1) is 4 years old, many things could have changed. But so far any reviewed version was faulty in regards to consistency.

1 - https://jepsen.io/analyses/mongodb-4.2.6

vorticalbox

1 replies

2024-07-07 18:20:43 UTC

That is for mongo 4.x but latest stable is 6.0.7 which has note More resilient operations and Additional data security.

https://www.mongodb.com/blog/post/big-reasons-upgrade-mongod...

mmcclimon

0 replies

16h59m

2024-07-08 01:31:08 UTC

FWIW, the latest stable release is 7.0.12, released a week or so ago: https://www.mongodb.com/docs/upcoming/release-notes/7.0/. (I'm not sure why the URL has /upcoming/ in it, actually: 7.0 is definitely the stable release.)

endisneigh

1 replies

1d2h

2024-07-07 16:24:17 UTC

That’s been resolved for a long time now (not to say that MongoDB is perfect, though).

nick_

0 replies

23h28m

2024-07-07 19:02:14 UTC

I just want to point out that 4 years is not a long time in the context of consistency guarantees of a database engine.

I have listened to Mongo evangelists a few times despite my skepticism and been burned every time. Mongo is way oversold, IMO.

pipe_connector

0 replies

22h35m

2024-07-07 19:54:53 UTC

Jepsen found a more concerning consistency bug than the above results when Postgres 12 was evaluated [1]. Relevant text:

We [...] found that transactions executed with serializable isolation on a single PostgreSQL instance were not, in fact, serializable

I have run Postgres and MongoDB at petabyte scale. Both of them are solid databases that occasionally have bugs in their transaction logic. Any distributed database that is receiving significant development will have bugs like this. Yes, even FoundationDB.

I wouldn't not use Postgres because of this problem, just like I wouldn't not use MongoDB because they had bugs in a new feature. In fact, I'm more likely to trust a company that is paying to consistently have their work reviewed in public.

1. https://jepsen.io/analyses/postgresql-12.3

jokethrowaway

5 replies

21h43m

2024-07-07 20:47:18 UTC

Have you tried it in production? It's absolute mayhem.

Deadlocks were common; it uses a system of retries if the transaction fails; we had to disable transactions completely.

Next step is either writing a writer queue manually or migrating to postgres.

For now we fly without transaction and fix the occasional concurrency issues.

pipe_connector

3 replies

21h19m

2024-07-07 21:11:30 UTC

Yes, I have worked on an application that pushed enormous volumes of data through MongoDB's transactions.

Deadlocks are an application issue. If you built your application the same way with Postgres you would have the same problem. Automatic retries of failed transactions with specific error codes are a driver feature you can tune or turn off if you'd like. The same is true for some Postgres drivers.

If you're seeing frequent deadlocks, your transactions are too large. If you model your data differently, deadlocks can be eliminated completely (and this advice applies regardless of the database you're using). I would recommend you engage a third party to review your data access patterns before you migrate and experience the same issues with Postgres.

akoboldfrying

2 replies

20h24m

2024-07-07 22:05:53 UTC

Deadlocks are an application issue.

Not necessarily, and not in the very common single-writer-many-reader case. In that case, PostreSQL's MVCC allows all readers to see consistent snapshots of the data without blocking each other or the writer. TTBOMK, any other mechanism providing this guarantee requires locking (making deadlocks possible).

So: Does Mongo now also implement MVCC? (Last time I checked, it didn't.) If not, how does it guarantee that reads see consistent snapshots without blocking a writer?

pipe_connector

0 replies

19h23m

2024-07-07 23:07:06 UTC

MongoDB (via WiredTiger) has used MVCC to solve this problem since transactions were introduced.

devit

0 replies

16h34m

2024-07-08 01:55:52 UTC

Locking doesn't result in deadlocks, assuming that it's implemented properly.

If you know the set of locks ahead of time, just sort them by address and take them, which will always succeed with no deadlocks.

If the set of locks isn't known, then assign each transaction an increasing ID.

When trying to take a lock that is taken, then if the lock owner has higher ID signal it to terminate and retry after waiting for this transaction to terminate, and sleep waiting for it to release the lock.

Otherwise if it has lower ID abort the transaction, wait for the conflicting transaction to finish and then retry the transaction.

This guarantees that all transactions will terminate as long as each would terminate in isolation and that a transaction will retry at most once for each preceding running transaction.

It's also possible to detect deadlocks by keeping track of which thread every thread is waiting for and signaling the either the highest transaction ID in the cycle or the one the lowest ID is waiting for to abort, wait for ID it was waiting for terminate and retry.

threeseed

0 replies

18h38m

2024-07-07 23:52:29 UTC

Next step is either writing a writer queue manually

You can just use a connection pool and limit writer threads.

You should be using one to manage your database connections regardless of which database you are using.

danpalmer

5 replies

18h16m

2024-07-08 00:13:50 UTC

MongoDB had "strong consistency" back in 2013 when I studied it for my thesis. The problem is that consistency is a lot bigger space than being on or off, and MongoDB inhabited the lower classes of consistency for a long time while calling it strong consistency which lost a lot of developer trust. Postgres has a range of options, but the default is typically consistent enough to make most use-cases safe, whereas Mongo's default wasn't anywhere close.

They also had a big problem trading performance and consistency, to the point that for a long time (v1-2?) they ran in default-inconsistent mode to meet the numbers marketing was putting out. Postgres has never done this, partly because it doesn't have a marketing team, but again this lost a lot trust.

Lastly, even with the stronger end of their consistency guarantees, and as they have increased their guarantees, problems have been found again and again. It's common knowledge that it's better to find your own bugs than have your customers tell you about them, but in database consistency this is more true than normal. This is why FoundationDB are famous for having built a database testing setup before a database (somewhat true). It's clear from history that MongoDB don't have a sufficiently rigorous testing procedure.

All of these factors come down to trust: the community lacks trust in MongoDB because of repeated issues across a number of areas. As a result, just shipping "strong consistency" or something doesn't actually solve the root problem, that people don't want to use the product.

pipe_connector

1 replies

16h2m

2024-07-08 02:27:40 UTC

It's fair to distrust something because you were burned by using it in the past. However, both the examples you named -- Postgres and FoundationDB -- have had similar concurrency and/or data loss bugs. I have personally seen FoundationDB lose a committed write. Writing databases is hard and it's easy to buy into marketing hype around safety.

I think you should reconsider your last paragraph. MongoDB has a massive community, and many large companies opt to use it for new applications every day. Many more people want to use that product than FoundationDB.

daniel-grigg

0 replies

15h23m

2024-07-08 03:07:05 UTC

Can you elaborate on why ‘many large companies’ are choosing MongoDB over alternatives and what their use cases are? I’ve been using Mdb for a decade and with how rich the DB landscape is for optimising particular workloads I just don’t see what the value proposition is for Mdb is compared to most of them. I certainly wouldn’t use it for any data intensive application when there’s other fantastic OLAP dbs, nor some battle hardened distributed nodes use case, so that leaves a ‘general purpose db with very specific queries and limited indexes’. But then why not just use as PG as others say?

nijave

1 replies

17h8m

2024-07-08 01:22:07 UTC

Have you looked at versions in the last couple years to see if they've made progress?

danpalmer

0 replies

16h39m

2024-07-08 01:50:52 UTC

This kinda misses my point. By having poor defaults in the past, marketing claims at-odds with reality, and being repeatedly found to have bugs that reduce consistency, the result is that customer have no reason to trust current claims.

They may have fixed everything, but the only way to know that is to use it and see (because the issue was trusting marketing/docs/promises), and why should people put that time in when they've repeatedly got it wrong, especially when there are options that are just better now.

throwaway2037

0 replies

14h42m

2024-07-08 03:47:52 UTC

    > my thesis

Can you share a link? I would like to read your research.

throwup238

0 replies

1d2h

2024-07-07 16:05:31 UTC

> I'm not sure what "with strong consistency benefits" means.

"Doesn't use MongoDB" was my first thought.

lkdfjlkdfjlg

0 replies

1d2h

2024-07-07 16:04:48 UTC

Pongo - Mongo but on Postgres and with strong consistency benefits.

I don't read this as saying it's "MongoDB but with...". I read it as saying that it's Postgres.

Izkata

0 replies

1d2h

2024-07-07 16:04:38 UTC

MongoDB has supported the equivalent of Postgres' serializable isolation for many years now.

That would be the "I" in ACID

I'm not sure what "with strong consistency benefits" means.

Probably the "C" in ACID: Data integrity, such as constraints and foreign keys.

https://www.bmc.com/blogs/acid-atomic-consistent-isolated-du...

hdhshdhshdjd

14 replies

2024-07-07 18:19:00 UTC

I use JSONB columns a lot, it has its place. It can fit certain applications, but it does introduce a lot of extra query complexity and you lose out on some ways to speed up query performance that you could get from a relational approach.

Which is to say JSONB is useful, but I wouldn’t throw the relational baby out with the bath water.

doctor_eval

5 replies

20h59m

2024-07-07 21:30:47 UTC

I’ve been doing some reasonably serious playing with the idea of using jsonb columns as a kind of front end to relational tables. So basically, external interactions with the database are done using JSON, which gives end users some flexibility, but internally we effectively create a realtime materialised view of just those properties we need from the json.

Anyone else tried this approach? Anything I should know about it?

Zenzero

3 replies

20h10m

2024-07-07 22:20:28 UTC

My mental model has always been to only use JSONB for column types where the relations within the json object are of no importance to the DB. An example might be text editor markup. I imagine if you start wanting to query within the json object you should consider a more relational model.

mbesto

1 replies

16h5m

2024-07-08 02:24:56 UTC

I always thought JSONB is a great use case for client-specific customizations. For example - a CRM application where you have structured DB tables like "account", "opportunity", etc. but then a client-specific table might be "cohort" that is linked to "account" and has fields "cohort size", "cohort name", etc. You just chuck the cohort table in JSONB and now the company who uses the CRM that has the custom cohort table can change that table without creating/migrating your RDMS table structure.

Curious what other people think here?

Zenzero

0 replies

15h49m

2024-07-08 02:40:49 UTC

That sounds like something that would be better tailored to well normalized tables and custom row entries. The question I'd ask is if something you're concerned about being a column could instead be a row in a different table.

doctor_eval

0 replies

17h12m

2024-07-08 01:18:14 UTC

My experience has been that users like to include a lot of information that's not relevant to the application I'm working on.

My application couldn't really care less about customer names, for example. But the people who buy my software naturally do care - and what's worse, each of my potential customers has some legacy system which stores their customer names in a different way. So one problem I want to address is, how do I maintain fidelity with the old system, for example during the data migration, while enabling me to move forward quickly?

My solution has been to keep non-functional data such as customer names in JSON, and extract only the two or three fields that are relevant to my application, and put them into a regular SQL database table.

So far this has given me the best of both worlds: a simple and highly customisable JSON API for these user-facing objects with mutable shapes, but a compact SQL backend for the actual work.

hdhshdhshdjd

0 replies

20h30m

2024-07-07 21:59:43 UTC

I do something similar, building a lightweight search index over very large relational datasets.

So the tables are much simpler to manage, much more portable, so I can serve search off scalable hardware without disturbing the underlying source of truth.

The downside is queries are more complex and slower.

martinald

3 replies

19h0m

2024-07-07 23:30:00 UTC

I've had extremely disappointing performance with postgres JSONB columns. I've used it a lot for MVPs and betas where it's much easier to use JSONB and lock in your exact relational schema lately.

I've now decided this path is a mistake because the performance is so bad. Even with low thousands/tens of thousands of rows it becomes a huge problem, queries that would take <1ms on relational stuff quickly start taking hundreds of ms.

Optimizing these with hand rolled queries is painful (I do not like the syntax it uses for jsonb querying) and for some doesn't really fix anything much, and indexes often don't help.

It seems that jsonb is just many many order of magnitudes slower, but I could be doing something wrong. Take for example storing a dictionary of string and int (number?) in a jsonb column. Adding the ints up in jsonb takes thousands of times longer rather than having these as string and int in a standard table.

Perhaps I am doing something wrong; and I'd love to know it if I am!

ak217

1 replies

16h23m

2024-07-08 02:07:18 UTC

Did you have arrays in your jsonb data? I don't currently use jsonb for production performance sensitive queries, but in the past what I learned was that it's great at automatically indexing everything except arrays. You have to manually declare indexes on your arrays.

When jsonb works, it's incredible. I've had many... suboptimal experiences with mongo, and jsonb is just superior in my experience (although like I said, I haven't used it for performance critical stuff in production). For a long time, it kinda flew under the radar, and still remains an underappreciated feature of Postgres.

oskar_dudycz

0 replies

10h30m

2024-07-08 08:00:36 UTC

There are, but indeed they're a bit painful to index correctly. I'm planning to add in the future Generated Columns https://www.postgresql.org/docs/current/ddl-generated-column... to give option to extract part of the json as regular column.

phamilton

0 replies

12h33m

2024-07-08 05:57:20 UTC

indexes often don't help.

An "index on expression" should perform the same regardless of the input column types. All that matters is the output of the expression. Were you just indexing the whole jsonb column or were you indexing a specific expression?

For example, an index on `foo(user_id)` vs `foo(data->'user_id')` should perform the same.

oskar_dudycz

2 replies

22h53m

2024-07-07 19:37:00 UTC

I'm planning to add possibility to use Generated Columns in the future https://www.postgresql.org/docs/current/ddl-generated-column... to allow more optimisations.

cypherpunks01

1 replies

14h42m

2024-07-08 03:47:42 UTC

Can a Generated Column read data from a JSONB column? That'd be really cool, but I'm not familiar enough with generated columns to know.

jamescrowley

0 replies

13h6m

2024-07-08 05:24:34 UTC

Yep!

noisy_boy

0 replies

15h5m

2024-07-08 03:24:46 UTC

I think if a field in a JSONB column is used frequently in the most common query patterns, it might be time to "surface" it up among other non-JSONB relational columns. There will be some additional overhead e.g. it should be made a read-only column for consistency's sake; if you update the value of this field in the JSONB column, this surfaced column should also be updated.

However, a bit of duplication is not a terrible trade-off for significantly improved query performance.

Squarex

9 replies

1d3h

2024-07-07 15:21:50 UTC

How does it compare with FerretDB[0]?

[0] https://www.ferretdb.com/

oskar_dudycz

3 replies

1d1h

2024-07-07 17:26:30 UTC

Yes, I'm using MongoDB API in Pongo to keep the muscle memory. So, it's a library that translates the MongoDB syntax to native PostgreSQL JSONB queries.

remram

2 replies

18h19m

2024-07-08 00:11:22 UTC

"how does it compare?" "yes"

Are you sure you replied to the right comment?

oskar_dudycz

1 replies

11h34m

2024-07-08 06:55:48 UTC

Yes to "The posted project looks like a client that connects to pg but behaves like Mongo, where Ferret is a server that accepts Mongo client connections and uses pg as backend storage."

remram

0 replies

4h21m

2024-07-08 14:08:54 UTC

Yeah, FYI that's a sibling comment, not the one you replied to.

aleksi

3 replies

1d2h

2024-07-07 15:35:54 UTC

(I'm FerretDB co-founder)

As far as I can tell, Pongo provides an API similar to the MongoDB driver for Node that uses PostgreSQL under the hood. FerretDB operates on a different layer – it implements MongoDB network protocol, allowing it to work with any drivers and applications that use MongoDB without modifications.

Keyframe

1 replies

2024-07-07 18:22:21 UTC

Even monstache?

aleksi

0 replies

19h40m

2024-07-07 22:50:19 UTC

That was the first time I heard about that project. Someone could check it using our guide: https://docs.ferretdb.io/migration/premigration-testing/ Or we will check it ourselves later: https://github.com/FerretDB/FerretDB/issues/4429

Sytten

0 replies

14h49m

2024-07-08 03:41:05 UTC

I dont want to sound rude, but as a bootstrap founder it kinda boggles my mind how much money people can raise for a product like ferretdb. I just don't see how it can make VC level return without at the very least changing licenses which seems to ne the premise behind creating this MongoDB proxy. I am sure there is a narrative for it though so best of luck!

Also check you managed service links on GitHub, half are dead.

Zambyte

0 replies

1d2h

2024-07-07 15:39:10 UTC

The posted project looks like a client that connects to pg but behaves like Mongo, where Ferret is a server that accepts Mongo client connections and uses pg as backend storage.

zulban

7 replies

1d3h

2024-07-07 15:14:28 UTC

Neat. When I migrated a project from mongo to postgres I took a similar approach, except I only implemented the mongo feel I needed within my own project instead of building a proper library as done here. I was surprised how much performance improved despite using a hacky wrapper.

https://blog.stuartspence.ca/2023-05-goodbye-mongo.html

Personally tho, I plan to just drop all similarity to mongo in future projects.

jrochkind1

2 replies

18h59m

2024-07-07 23:31:28 UTC

I was surprised how much performance improved

Are you saying you got better performance from postgres jsonb than from mongodb itself?

zo1

1 replies

10h27m

2024-07-08 08:03:26 UTC

From the article, they mention this alongside a neat graph: "API calls generally take 8 ms now, not 150 ms."

His endpoints went from 150ms (with Mongo) to 8ms after moving to Postgres.

jrochkind1

0 replies

4h22m

2024-07-08 14:08:25 UTC

Thanks. How embarressing for mongo, woah.

throwaway2037

1 replies

14h36m

2024-07-08 03:53:55 UTC

Cripes, this is a good blog post. Did it get posted to HN for discussion? If not, it should have been. I especially like this part:

    > I'm trying to stay humble. Mongo must be an incredibly big project with lots of nuance. I'm just a solo developer and absolutely not a database engineer. I've also never had the opportunity to work closely with a good database engineer. However I shouldn't be seeing improvements like this with default out of the box PostgreSQL compared to all the things I tried over the years to fix and tune Mongo.

Humbleness: That is rare to see around here. It is impressive that you go such speed-ups for your use case. Congrats and thank you to share with the blog post.

EDIT

The Morgan Freeman meme at the end gave me a real laugh. I would say the same about my experience with GridGain ("GridPain").

dayjaby

0 replies

12h12m

2024-07-08 06:17:55 UTC

I knew it was time to migrate when my non-programmer wife was becoming familiar with Mongo shortcomings

That part was so relatable

oskar_dudycz

0 replies

22h54m

2024-07-07 19:36:05 UTC

Yup, I might not reach full compliance, but I will try to follow the Pareto principle. Thanks for the link and kind feedback!

Thaxll

0 replies

18h45m

2024-07-07 23:45:00 UTC

I remember your post back then and it did not made sense at all, many pointed out it was lacking information and you probably did something wrong with mongo.

All the stuff under Mongo Problems is garbage, sorry.

salomonk_mur

7 replies

1d3h

2024-07-07 15:13:09 UTC

What would be the advantage of using this instead of simple jsonb columns?

joshmanders

3 replies

1d3h

2024-07-07 15:25:47 UTC

It uses JSONb under the hood. Just gives you a very "mongo" feel to using PostgreSQL. Not sure how I feel about it.

    CREATE TABLE IF NOT EXISTS %I (_id UUID PRIMARY KEY, data JSONB)

wood_spirit

2 replies

1d3h

2024-07-07 15:29:12 UTC

Can they make it use uuid7 for ids for better insert_becomes_append performance?

lgas

1 replies

1d2h

2024-07-07 16:27:12 UTC

Yes

oskar_dudycz

0 replies

1d1h

2024-07-07 17:29:34 UTC

Yes, I'm using JSONB underneath and translating the MongoDB syntax to native queries. As they're not super pleasant to deal with, then I thought that it'd be nice to use some familiar to many MongoDB API.

Regarding IDs, you can use any UUID-compliant format.

lopatin

1 replies

1d1h

2024-07-07 16:55:30 UTC

jsonb isn't web scale. Mongo is web scale.

digger495

0 replies

2024-07-07 17:37:58 UTC

I see what you did there

imnotjames

0 replies

1d3h

2024-07-07 15:16:23 UTC

Looks like it natches the mongo node API

harel

7 replies

20h49m

2024-07-07 21:40:54 UTC

I regularly find the hybrid model is a sweet spot. I keep core fields as regular columns and dynamic data structures as JSONB. It brings the best of both worlds together.

Waterluvian

5 replies

19h58m

2024-07-07 22:31:56 UTC

I do this too with Postgres and it is just the best of both.

A robot is a record. A sensor calibration is a record. A warehouse robot map with tens of thousands of geojson objects is a single record.

If I made every map entity its own record, my database would be 10_000x more records and I’d get no value out of it. We’re not doing spatial relational queries.

hobs

4 replies

16h4m

2024-07-08 02:26:31 UTC

It's great when you have no reason EVER to decompose the data.

That being said, when you start going "wait why is one record like this? oh no we have a bug and have to fix one of the records that looks like this across all data" and now you get to update 10,000x the data to make one change.

harel

3 replies

9h42m

2024-07-08 08:47:50 UTC

Small price to pay in my opinion. How often will that happen vs how often the database is used. Migrations like that can be done incrementally over time. It's a solved problem.

Waterluvian

2 replies

6h27m

2024-07-08 12:03:19 UTC

It’s also trivial to do. My JSON fields are all backed by JSON Schema. And I just write a data migration that mutates the data in some way and have the migration run by one host in a rate limited manner. It’s not quite as good as a traditional change in schema but it’s such a non-issue.

hobs

1 replies

4h48m

2024-07-08 13:42:26 UTC

I am glad it works! I have just been subject to several systems that have grown over time that worked very well until it became a problem (and then a huge one) so I am glad you are taking a disciplined approach.

Waterluvian

0 replies

4h34m

2024-07-08 13:55:51 UTC

Yup you’re absolutely right. There is no one size fits all. The more you can plan, and the more clear your case is, the less you need to pay for having flexibility.

webprofusion

0 replies

16h9m

2024-07-08 02:21:17 UTC

Yeah we do this as well. Have previously been a heavy MongoDB user but when we migrated to Postgres (using some relational schema with some JSONB for nested objects etc) it just made querying so much easier and reliability has been great.

willsmith72

4 replies

19h43m

2024-07-07 22:46:40 UTC

It's technologically cool, but I would love a "why" section in the README. Is the idea you're a mongo Dev/love the mongo api and want to use it rather than switch to pg apis? Or want to copy some code over from an old project?

I'm sure there are use cases, I'm just struggling to grasp them. Especially if it's about reusing queries from other projects, AI is pretty good at that

oskar_dudycz

1 replies

10h36m

2024-07-08 07:53:58 UTC

Good call! I added such a section: https://github.com/event-driven-io/Pongo?tab=readme-ov-file#.... I'd be interested in your thoughts on it.

willsmith72

0 replies

5h42m

2024-07-08 12:48:02 UTC

nice!

megadal

1 replies

16h25m

2024-07-08 02:04:57 UTC

AI is pretty good at that

Good at what? Rewriting the queries?

I think the point of Pongo is you can use the exact same queries for the most part and just change backends.

I've worked a job in the past where this would have been useful (they chose Mongo and regretted it).

willsmith72

0 replies

12h29m

2024-07-08 06:01:31 UTC

For sure, but it feels really risky. If it's a small codebase, I would be more confident knowing what queries I was using and just switch them. If it's a large codebase, I'd want some really comprehensive test coverage, including performance tests

rework

4 replies

23h0m

2024-07-07 19:30:17 UTC

Looks sort of like MartenDB but trying to minic mongo api, unsure why anyone would want to do that... mongo api is horrible...

JanSt

3 replies

22h58m

2024-07-07 19:31:55 UTC

Wouldn't that allow to switch from Mongo to Postgres without having to rewrite all of your app?

oskar_dudycz

2 replies

22h55m

2024-07-07 19:35:04 UTC

Hint: I'm an ex-Marten maintainer, so the similarity is not accidental ;)

As Op said, not needing to rewrite applications or using the muscle memory from using Mongo is beneficial. I'm not planning to be strict and support only MongoDB API; I will extend it when needed (e.g. to support raw SQL or JSON Path). But I plan to keep shim with compliant API for the above reasons.

MongoDB API has its quirks but is also pretty powerful and widely used.

rework

1 replies

22h34m

2024-07-07 19:56:26 UTC

Oh, so you are, then we can rest assured this will end up being a solid project!

I personally can't stand mongodb, its given me alot of headaches, joined a company and the same week I joined we lost a ton of data and the twat who set it up resigned in the middle of the outage. Got it back online and spend 6m moving to postgresql.

oskar_dudycz

0 replies

10h33m

2024-07-08 07:56:51 UTC

Thanks, that's the goal: to bring the solid and verified approach in Marten to Node.js land. The concept is similar, but the feature set will be different.

cpursley

4 replies

2024-07-07 17:36:04 UTC

Thanks, just added Pongo to the NoSQL section of my "Postgres Is Enough" gist:

https://gist.github.com/cpursley/c8fb81fe8a7e5df038158bdfe0f...

czechdeveloper

1 replies

13h21m

2024-07-08 05:09:16 UTC

I suggest you add https://github.com/tensorchord/pgvecto.rs for vector search as well

cpursley

0 replies

4h49m

2024-07-08 13:41:23 UTC

This is really nice. Your project?

pmarreck

0 replies

16h5m

2024-07-08 02:25:26 UTC

“Scalling” should say “Scaling”

Nice list!

oskar_dudycz

0 replies

22h55m

2024-07-07 19:35:16 UTC

Thank you!

314156

4 replies

23h33m

2024-07-07 18:56:42 UTC

https://docs.oracle.com/en/database/oracle/mongodb-api/mgapi...

Oracle database has had a MongoDB compatible API for a few years now.

cyberpunk

3 replies

23h1m

2024-07-07 19:29:18 UTC

And it only costs 75k a seat per year per developer, with free bi yearly license compliance audits, a million in ops and hardware to get near prod and all the docu is paywalled. What a deal!

slau

1 replies

21h14m

2024-07-07 21:16:21 UTC

A client had a DB hosted by Oracle. The client was doing most of their compute on AWS, and wanted to have a synchronised copy made available to them on AWS. Oracle quoted them a cool $600k/year to operate that copy, with a 3 year contract.

DMS + Postgres did it for $5k/year.

cyberpunk

0 replies

12h28m

2024-07-08 06:01:42 UTC

Client of mine wanted to shift a rac cluster from some aging sparc gear into a VMware or openstack or whatever farm they had on premise; oracle demanded they pay CPU licenses for every single CPU in the cluster as each one could “potentially” run the oracle database, quoted them seven figures.

They rewrote the app instead.

314156

0 replies

20h35m

2024-07-07 21:54:43 UTC

Maybe you are unaware of this? https://www.oracle.com/cloud/free/

sberder

3 replies

9h32m

2024-07-08 08:58:01 UTC

This looks great, I'll definitely give it a try. As many mentioned already, having classic columns and a JSON(B) column seems to be a common solution. How do you handle data validation for the JSON documents? My current project uses Django for metadata. I've been thinking about creating a layer similar to model fields in Django. You would declare a JSON "model" through those fields and assign it to the actual model JSON field.

throwaway76324

1 replies

8h51m

2024-07-08 09:39:19 UTC

You can just specify a model/DTO object and serialize it as JSON when saving. Many frameworks do that automatically so you don't need to think about it. At work we just annotate the field in the model as a json-field, and the framework will handle the json-conversion automatically and store the other fields in the model as regular database columns.

pseudo code (to not trigger language wars):

   class Foo {
   
       @Id
       UUID id;

       String name;

       @Json
       MyCustomModel model;
   }

Adding fields is not an issue, as it will simply be missing a value when de-serializing. Your business logic will need to handle its absence, but that is no different than using MongoDB or "classic" table columns

sberder

0 replies

3h40m

2024-07-08 14:50:38 UTC

That's a very low cost approach, I love it! I still think the Django ecosystem would benefit from a standardized/packaged approach including migrations. I'll ponder a bit more

oskar_dudycz

0 replies

8h23m

2024-07-08 10:07:34 UTC

Thank you! I'm planning to add support to JSON schema and run the validation upon insert/update operation.

frithsun

3 replies

15h9m

2024-07-08 03:21:02 UTC

Programmers would be better served by learning nothing except SQL instead of their current strategy of trying to learn everything except SQL.

dboreham

1 replies

3h33m

2024-07-08 14:57:14 UTC

They should learn about b-trees and how indexed queries can be done with them either with or without an explicit query language. Then they can decide what kind of data storage service they need. Understand what's happening inside the black box.

cryptonector

0 replies

1h49m

2024-07-08 16:41:30 UTC

Yes, for sure, though I'd still start with SQL.

dudeinjapan

0 replies

12h0m

2024-07-08 06:30:23 UTC

While I’d agree that understanding SQL basics is an important fundamental for novices to learn, I started using MongoDB 11 years ago and haven’t looked back.

posix_monad

2 replies

1d1h

2024-07-07 17:23:50 UTC

Does MongoDB have serious market share compared to DynamoDB (and similar clones from Azure, GCP) at this point?

maxdo

0 replies

18h24m

2024-07-08 00:06:21 UTC

Mongodb and dynamodb are completely different dbs. One is unlimited scale KV but very expensive , another is document nosql db that sells you idea “it just works” for lots of features , indexes on anything , aggregation , time series . Vector DB, sharding , replicas etc . It’s a very powerful db for sure.

dudeinjapan

0 replies

23h45m

2024-07-07 18:45:38 UTC

Totally. Many of the biggest tech companies are using for core use cases. Stripe uses a modified version: https://stripe.com/blog/how-stripes-document-databases-suppo...

We use MongoDB’s cloud offering called Atlas as our core DB at TableCheck.

ilius2

2 replies

11h55m

2024-07-08 06:35:37 UTC

If I were to start a new project, I would directly use postgres, and possibly add a JSONB column ONLY FOR OPTIONAL fields that you don't query frequently. Throwing everything in a document is just fermenting chaos and pain. That being said, I do love the syntax and structure of Mongo pipelines over SQL.

throwaway76324

0 replies

10h13m

2024-07-08 08:17:20 UTC

At $WORK, we use the same approach for integrations with 3rd party systems.

The data that is common for all integrations are stored as columns in a relational table. Data that are specific for each integration are stored in JSONB. This is typically meta data used to manage each integration that varies.

It works great and you get the combination of relational safety and no-schema flexibility where it matters.

stanislavb

0 replies

11h35m

2024-07-08 06:55:32 UTC

"Throwing everything in a document is just fermenting chaos and pain." - I LOVE THIS.

revskill

1 replies

2024-07-07 18:07:26 UTC

Genius.

oskar_dudycz

0 replies

22h52m

2024-07-07 19:37:40 UTC

ramchip

1 replies

1d2h

2024-07-07 15:37:35 UTC

Have you tried it with CockroachDB?

oskar_dudycz

0 replies

1d1h

2024-07-07 17:30:20 UTC

I did not, but I'm not using any fancy syntax so far besides JSONB operators. If it won't work, then I'm happy to adjust it to make it compliant.

navbryce

1 replies

18h34m

2024-07-07 23:56:06 UTC

I made a joke tweet about this in Nov 2023 (even called it "Pongo"). This is definitely a just a funny coincidence, but I'm going to pretend like I can see into the future: https://x.com/navbryce/status/1720580136737894661

oskar_dudycz

0 replies

10h10m

2024-07-08 08:20:26 UTC

"Great mind think alike" B-)

joeyagreco

1 replies

1d3h

2024-07-07 15:15:49 UTC

Good work! I would like to see a section on the README outlining the benefits of Pongo

oskar_dudycz

0 replies

1d1h

2024-07-07 17:30:38 UTC

Thanks, I'll try to cover that, good call!

deanCommie

1 replies

17h58m

2024-07-08 00:32:16 UTC

So DocumentDB? https://news.ycombinator.com/item?id=18870397

oskar_dudycz

0 replies

10h32m

2024-07-08 07:58:05 UTC

Yes, a similar idea, but I don't aim to be 100% MongoDB compliant or full replacement. My goal is to use as many of PostgreSQL features as possible. Having the library level as translation will allow more scenarios like, e.g. sharing connection and using PostgreSQL hosting.

aussieguy1234

1 replies

17h39m

2024-07-08 00:50:51 UTC

I'd like to know why AWS went with Aurora DB for their DocumentDB backend. Did the Mongo license change trigger a rush to build something Mongo compatible, but not quite MongoDB?

oskar_dudycz

0 replies

8h19m

2024-07-08 10:11:29 UTC

You can check more in those sources: - https://techcrunch.com/2019/01/09/aws-gives-open-source-the-... - https://news.ycombinator.com/item?id=18871483 - https://www.enterprisedb.com/blog/documentdb-really-postgres...

TLDR: AWS didn't want to pay for licenses and rolled out their own thing.

a13n

1 replies

18h23m

2024-07-08 00:07:24 UTC

Does Pongo work with mongoose? I would guess most mongo users are using mongoose and supporting that library would drive more adoption.

oskar_dudycz

0 replies

10h7m

2024-07-08 08:22:51 UTC

Not yet, but it's a nice idea. I added GH issue to track that: https://github.com/event-driven-io/Pongo/issues/13

Tao3300

1 replies

22h12m

2024-07-07 20:17:46 UTC

Ditch the dalmatian before Disney rips your face off.

oskar_dudycz

0 replies

10h36m

2024-07-08 07:54:20 UTC

Fair, I'll redraw it ;)

DonnyV

1 replies

15h32m

2024-07-08 02:58:22 UTC

Would love a C# version of this. I usually use Mongodb for all of our projects. But we need to use Postgres for a project. This would come in very handy.

oskar_dudycz

0 replies

11h32m

2024-07-08 06:57:56 UTC

You can check Marten, that I was co-maintaining: https://martendb.io/. It doesn't have MongoDB-compliant API, but it's mature, stable and efficient.

vmfunction

0 replies

11h7m

2024-07-08 07:23:05 UTC

Hmmm how does this compare to https://www.ferretdb.com ?

How does this handle large files? Is it enough to replace GridFS? One of main attraction for MongoDb is it's handling of large files.

tracker1

0 replies

3h2m

2024-07-08 15:28:32 UTC

Has this been tested with CockroachDB or any other databases that use a mostly compatible PostgreSQL wire protocol and query language?