return to table of content

Mongo but on Postgres and with strong consistency benefits

marcus_holmes
34 replies
14h15m

I tried a similar approach in a previous startup - treat data as documents and store in a JSONB field.

Postgres was awesome and handled this brilliantly, but the lack of schema and typing killed it. We just ended up fighting data quality the whole time. We couldn't assume that any document had all the required fields, or that they were in a format that made sense e.g. the Price column sometimes had currency symbols, and sometimes commas-and-periods in UK/US format and sometimes in Euro format - sorting by Price involved some complicated parsing of all the records first.

We moved back to relational tables.

I won't say I'd never do this again, but I would definitely not just throw JSON documents to a database and expect good things to happen.

winternewt
11 replies
10h26m

If you have schema requirements, why _not_ do it with tables and columns instead? The point of the jsonb column would be precisely to store arbitrary stuff that has no schema.

bruce343434
8 replies
9h4m

I usually see this (dynamic scripting langs, schemaless databases) play out as a very good environment to prototype in, very few barriers to change things or make stuff up as you go. Then the prototype is not discarded but "iterated" to form the actual production code. The (now needlessly so) flexible JSON store is grand-fathered in.

I have yet to come across an actual use case for JSON stores in a production app with an established design. What the hell do you mean you have no idea what the data might hold?? Why are you storing unknown, undefined, undefinable things??? Or perhaps, there actually is a schema i.e. fields we rely on being present, but we were too lazy to define it formally?

tracker1
0 replies
3h47m

Adjacent data that is very alike for most usage but different in subtle ways.

Classified ads, the additional details for a Car are different than a Shirt, but both would be ads. And adding a nearly infinite number of fields or a flexible system in a set of schema or detail tables is significantly worse than unstructured JSON.

Another would be records from different, but related systems. Such as the transaction details for an order payment. Paypal data will be different from your CC processor, or debit transaction, but you can just store it as additional details for a given "payment" record.

Another still would be in healthcare, or any number of other systems where the structures will be different from one system to another depending on data interchange, where in the storage you don't care too much, only in the application layer will it make any necessary difference.

okr
0 replies
6h59m

Customers sent custom things.

mexicocitinluez
0 replies
5h36m

I have yet to come across an actual use case for JSON stores in a production app with an established design

Healthcare. Clinical data is unstructured and constantly changing. Would you build out a table with 2000 columns that changed yearly? What about 5000?

goosejuice
0 replies
3h30m

Data owned by another service/application/org that is written and read without modification.

For example, a healthcare document that wasn't built or technically owned by the application storing it.

For example, a web text editor that serializes it's state as json.

Not json, but a web scraper storing html documents.

These have structure, it's only that the structure is built/maintained outside of the application storing it. You could of course transform it but I think it's a bit obvious where that might not be worth the cost/complexity.

eropple
0 replies
4h53m

Well, there are a decent number of those use cases, but the one I find most compelling is storing an exact record of what another service sent me in case I need it later. I pull out and store in my typed schema the relevant bits of, say, an OpenID Connect response, but I'll often store the entire response as well in case I want to add functionality in the future based on additional fields, etc. in that response and roll forward without having to have an intermediate state.

cryptonector
0 replies
1h52m

You can always change the source of truth so instead of being the JSON texts it's the properly normalized schema and then you can have a VIEW that returns JSON for apps that need JSON. And if need be you can denormalize a bit and have a JSONB column that stores the document as the serialization of the rows' other columns.

That's what's really nice about SQL: you can perform these sorts of schema surgeries and still retain backwards-compatibility using VIEWs.

cruffle_duffle
0 replies
1h15m

A use case for me is I’m calling a lot of different 3rd party APIs that return similar things but in widely different data structures with a lot of information specific to that site. I take what is standard across all the API’s and store that in my relational database but anything I didn’t use I store in a JSONB field just in case I need it some time in the future.

In other words I’m not really using anything in that JSONB field… at least right now.

SSLy
0 replies
6h17m

We're evaluating solutions to ingest business app events and infrastructure logs. Document-ish JSON-ish "DB"/search engine seems like best solution, because no one does structured logs properly.

Now we have to pick the stack. ELK, Loki + Graphana, Graylog or maybe just dump into MongoDB?

marcus_holmes
1 replies
8h49m

I think my learning was "all data has schema, eventually"

We thought it was just a bunch of data documents. But it turned out that to actually use that data in an application it had to have predictability, we had to know certain fields were present and in a fixed format. You know: a schema.

mexicocitinluez
0 replies
5h31m

I'm confused about this. Since when can't you enforce a schema on a document you're storing? Did you have no way to change the data before it hit the database?

NoSql stores still have schemas.

goosejuice
10 replies
12h20m

Sounds like most/all of these issues would be solved by validation at the app layer.

ivandenysov
5 replies
11h55m

True, but for example with db-level validations I don’t have to worry that a new developer ships a batch job that runs raw inserts for performance reasons and bypasses app layer validations.

goosejuice
2 replies
3h49m

I'm not advocating for no database level validations. One can and should have both in my opinion.

Postgres supports constraints on jsonb as well.

I would generally advocate for no write access outside of the app as well. Certainly for new developers. Get some tests on that batch job.

corytheboyd
1 replies
3h26m

I would generally advocate for no write access outside of the app as well.

FWIW I think OP was referring to app code still, but code opting in to skipping app data validations. Rails for example makes this very easy to do, and there are tons and tons of people out there selling this as a performance improvement (this one trick speeds up bulk operations!!!!). There are times where it’s useful, but it’s a very sharp knife that people can easily misuse.

So yeah, anyway, have both db constraints and app validations.

goosejuice
0 replies
2h53m

Thanks for pointing that out, I did misread that and have seen/written such things in rails land. We used active record import which gives the option of running validations.

But yeah, layering your safety nets is generally wise. I would include testing and code review in that as well.

zepolen
1 replies
5h16m

What? You should be exposing your data access layer as a microservice and all your db level validations should be done there...

You can even write a small DSL syntax language to make it easier to use for developers, and perhaps an Obviously Reduntant Middleware that sits between them to convert their programming language objects to the DSL. Add some batch support, perhaps transactional locks (using mongo, we want to be webscale after all) and perhaps a small terminal based client and voila, no one should ever need to deal with petty integrity in the db again.

eropple
0 replies
4h50m

This post is a fantastic example of Poe's Law. You had me for a second.

pennomi
3 replies
5h22m

App layer validation is for the user’s sanity. DB layer validation is for the developer’s sanity.

papichulo2023
1 replies
5h1m

Ehhh, most dbs are managed by migrations in the application, so essentially is duplicated logic

eropple
0 replies
4h52m

Most databases eventually have a human typing an INSERT statement or the moral equivalent. It's not duplication to have your correctness functions protect that, too.

goosejuice
0 replies
3h43m

I like this as a general rule, but I almost always have more refined validation at the app layer -- particularly when using jsonb.

Tool of choice is ecto which has excellent support for maintaining jsonb structure.

jimmyl02
2 replies
14h5m

the view I now have is that for a relational table, yes you have to suffer through migrations but at least they are in sql. for document based stores, you still have to have migrations, but they are just implemented in code

json documents sound great, especially initially, but end up being a maintenance nightmare

hans_castorp
0 replies
13h34m

for document based stores, you still have to have migrations, but they are just implemented in code

The problem with that - in my experience - is that migrating the structure for thousands (if not millions) of documents is way slower than running a DDL command (as it means reading each document, parsing it, modifying it and writing it back). Many DDL commands are just metadata update to the system catalogs so they are quite fast (e.g. adding a new column with a default value). With documents you wind up with millions of single row updates.

This can be mitigated by doing a "lazy" migration when a document with the old structure is first read. But that makes the code much more complicated.

globular-toast
0 replies
11h2m

Or, to put it another way, yes you have to write and maintain an upfront schema, but a document-based system has a schema too, it's just distributed amongst 50 codebase and 10 people's heads.

jamil7
2 replies
13h0m

I don’t do much backend work at the moment but I found using JSONB fields for prototyping endpoints to be quite handy. Once I had a feature figured out and working, I’d create a proper schema.

marcus_holmes
1 replies
8h45m

I've found Postgres' Hstore type useful for this, too.

I often create a "metadata" hstore field on a table and use it for random bits of data I don't want to create an actual field for yet. When I find that the application needs that bit of data and in a certain format, I'll move it into an actual field.

cryptonector
0 replies
1h50m

hstore is basically the flat precursor to JSONB, and IMO it's as good as obsolete. I wouldn't use hstore in a new project, just JSONB.

raverbashing
1 replies
9h0m

the Price column sometimes had currency symbols, and sometimes commas-and-periods in UK/US format and sometimes in Euro format

Well this is not a relational issue is it? It is a data normalization issue

marcus_holmes
0 replies
8h47m

Yes, true. But that was a single example off the top of my head. There were lots of others.

Not saying the approach isn't suitable for some use cases. Just that I'd be really careful that this is one of those use cases next time.

tracker1
0 replies
3h51m

This is an issue regardless of your type of Document storage. There are different options for verifying document details and versioning when approaching Document oriented databases. A typical usage is to have a per-record version number and pass documents through a verifier either as part of insert, update or query to ensure all migrations to the current version have occurred.

cruffle_duffle
0 replies
1h18m

You know I’ve found that LLM’s are awfully good at taking random JSON crap and “converting it” to a consistent format. You don’t even need a large LLM… something like ChatGPT 3.5 or an equivalent class of LLM can do the work just fine.

It’s not perfect and requires a fair amount of effort to nail the prompt but when it works it works.

aden1ne
0 replies
7h44m

Had a similar experience. What killed it for me, is that no statistics can be gathered for JSONB columns. This in turn really messes with the query planner once you do something like `select a.* from a join b on a.id = b.a_id where b.my_jsonb_column ->> 'foo' = 'bar';`.

Given the lack of statistics, the query planner loves going for a nested loop rather than hash or merge join where those would appropriate, leading to abysmal performance.

There is an thread[0] on the PostgreSQL mailing list to add at least some statistics on JSONB column, but this has gone nowhere since 2022.

[0]: https://www.postgresql.org/message-id/flat/c9c4bd20-996c-100...

karmakaze
27 replies
1d2h

What makes mongo mongo is its distibruted nature, without it you could just store json(b) in an RDBMS.

anonzzzies
8 replies
1d1h

So how easy is it to distribute it? I don’t have experience with it but the tutorials look terrible compared to, say, Scylla, Yuga, Cockroach, TiDB etc. Again, honest question?

rad_gruchalski
2 replies
1d1h

Pongo seems to be a middleware between your app and Postgres. So it will most certainly work absolutely fine on YugabyteDB, if one’s okay with occasional latency issues.

One could optimise it more for a distributed sql by implementing key partition awareness and connecting directly to a tserver storing the data one’s after.

oskar_dudycz
1 replies
1d

Yes, as long as database has support to JSONB and JSON path syntax (so PG 12 >= compliant) you should be good to go :)

theteapot
1 replies
1d

Does "distributed" mean sharded or just replicated? In either case it's a bit quirky but easy enough.

Scylla, Yuga, Cockroach, TiDB etc.

You have experience "distributing" all these DBs? That's impressive.

anonzzzies
0 replies
13h52m

We evaluated these over the years for different purposes/clients. And I don’t expect the ‘quirky’ part in 2024 I guess; it gets hard of course when the loads get specific; but in general I expect things to be automatic (multi master, sharding and replication) and simple to set up. Point out other nodes and done.

Too
1 replies
12h57m

It couldn’t be simpler. Just configure all clients and servers to be aware of each other and go.

Say what you want about the rest of mongo. This is an area where it actually shines.

anonzzzies
0 replies
10h54m

Alright, I'll try it. Maybe I had the wrong tutorials or maybe these were old.

tracker1
0 replies
3h24m

Relatively easy... though, and I may be out of date, you have to choose replication or sharding at any given level... so if you want horizontal scale plus redundancy, you will wind up with slightly more complexity. My experience trying to recover a broken cluster after 5 hours of Azure going up/down in series was anything but fun.

Would have to test, but the library for this post may well work with CockroachDB if you wanted to go that route instead of straight PostgreSQL. I think conceptually the sharding + replication of other DBs like Scylla/Cassandra and CockroachDB is a bit more elegant and easier to reason with. Just my own take though.

zihotki
5 replies
1d2h

But RDBMS'es are often also distributed. So what is mongo now?

marcosdumay
3 replies
1d2h

People don't usually distribute Postgres (unless you count read replicas and cold HA replicas). But well, people don't usually distribute MongoDB either, so no difference.

In principle, a cluster of something like Mongo can scale much further than Postgres. In practice, Mongo is full of issues even before you replicate it, and you are better with something that abstracts a set if incoherent Postgres (or sqlite) instances.

zozbot234
1 replies
1d

Postgres supports foreign data wrapper (FDW), which is the basic building block for a distributed DB. It doesn't support strong consistency in distributed settings as of yet, although it does provide two-phase commit which could be used for such.

williamdclt
0 replies
23h44m

strong consistency in distributed settings

I doubt it ever will. The point of distributing a data store is latency and availability, both of which would go down the drain with distributed strong consistency

hibikir
0 replies
22h53m

I think of the Stripe Mongo install, as it was a decade or so ago. It really was sharded quite wide, and relied on all shards having multiple replicas, as to tolerate cycling through them on a regular basis. It worked well enough to run as a source of truth for a financial company, but the database team wasn't small, dedicated to keeping all that machinery working well.

Ultimately anyone doing things at that scale is going to run a small priesthood doing custom things to keep the persistence payer humming, regardless of what the underlying database is. I recall a project abstracting over the Mongo API, as to allow for swapping the storage layer if they ever needed to

brabel
0 replies
1d1h

Often?? In my experience it's really hard to do it and still maintain similar performance, which kind of voids any benefit you may be looking for.

rework
3 replies
22h59m

What makes mongo mongo is its distibruted nature

Since when? Mongo was popular because it gave the false perception it was insanely fast until people found out it was only fast if you didn't care about your data, and the moment you ensure write happened it ended up being slower than an RDB....

jokethrowaway
2 replies
21h34m

Since forever, sharding, distributing postgres / mysql was not easy. There were a few proprietary extensions. Nowadays it's more accessible.

This was typical crap you had to say to pass fang style interview "oh of course I'd use mongo because this use case doesn't have relations and because it's easy to scale", while you know postgres will give you way less problems and allow you to make charts and analytics in 30m when finance comes around.

I made the mistake of picking mongo for my own startup, because of propaganda coming from interviewing materials and I regretted it for the entire duration of the company.

threeseed
1 replies
18h34m

Nowadays it's more accessible

Distributing PostgreSQL still requires proprietary extensions.

With the most popular being Citus which is owned by Microsoft and so questions should definitely remain about how long they support that instead of pushing users to Azure.

People like to bash MongoDB but at least they have a built-in, supported and usable HA/Clustering solution. It's ridiculous to not have this in 2024.

foobarkey
0 replies
8h37m

Trying to use MongoDB by default for new projects because of the built in HA, dumped Postgres because the HA story is so bad on bare metal (alright if you are ok burning money on RDS or simiar).

Current preference: 1. HA MongoDB 2. HA MariaDB (Galera) or MySQL Cluster 3. Postgres Rube Goldberg Machine HA with Patroni 4. No HA Postgres

lkdfjlkdfjlg
3 replies
1d2h

What makes mongo mongo is its distibruted nature, without it you could just store json(b) in an RDBMS.

Welllllllll I think that's moving the goalposts. Being distributed might be a thing _now_ but I still remember when it was marketed as the thing to have if you wanted to store unstructured documents.

Now that Postgres also does that, you're marketing Mongo as having a different unique feature. Moving the goalposts.

thfuran
2 replies
1d2h

It doesn't really seem reasonable to accuse someone of moving goalposts that you've just brought into the conversation, especially when they were allegedly set by a third party.

coldtea
1 replies
1d1h

Parent didn't "just brought them", they merely referrenced the pre-existing goalposts used to advocate for Mongo and reasons devs adopted it.

lkdfjlkdfjlg
0 replies
21h57m

Exactly this, very eloquent, thank you.

Yes, I'm still bitter because I was one of those tricked into it.

richwater
2 replies
1d2h

store json(b) in an RDBMS

I actually did this for as small HR application and it worked incredible well.jsonb gin indexes are pretty nice once you get the hang of the syntax.

And then, you also have all the features of Postgres as a freebie.

eddd-ddde
1 replies
1d1h

Personally, I much better like postgres json syntax than whatever mongo invented.

Big fan of jsonb columns.

oskar_dudycz
0 replies
22h51m

I'm planning to add methods for raw JSON path or, in general, raw SQL syntax to enable such fine-tuning and not need to always use MongoDB API. I agree that for many people, this would be better.

darby_nine
0 replies
1d2h

but then you wouldn't have the joy of using the most awkward query language invented by mankind

pipe_connector
21 replies
1d2h

MongoDB has supported the equivalent of Postgres' serializable isolation for many years now. I'm not sure what "with strong consistency benefits" means.

zihotki
5 replies
1d2h

Or is it? Jepsen reported a number of issues like "read skew, cyclic information flow, duplicate writes, and internal consistency violations. Weak defaults meant that transactions could lose writes and allow dirty reads, even downgrading requested safety levels at the database and collection level. Moreover, the snapshot read concern did not guarantee snapshot unless paired with write concern majority—even for read-only transactions."

That report (1) is 4 years old, many things could have changed. But so far any reviewed version was faulty in regards to consistency.

1 - https://jepsen.io/analyses/mongodb-4.2.6

endisneigh
1 replies
1d2h

That’s been resolved for a long time now (not to say that MongoDB is perfect, though).

nick_
0 replies
23h28m

I just want to point out that 4 years is not a long time in the context of consistency guarantees of a database engine.

I have listened to Mongo evangelists a few times despite my skepticism and been burned every time. Mongo is way oversold, IMO.

pipe_connector
0 replies
22h35m

Jepsen found a more concerning consistency bug than the above results when Postgres 12 was evaluated [1]. Relevant text:

We [...] found that transactions executed with serializable isolation on a single PostgreSQL instance were not, in fact, serializable

I have run Postgres and MongoDB at petabyte scale. Both of them are solid databases that occasionally have bugs in their transaction logic. Any distributed database that is receiving significant development will have bugs like this. Yes, even FoundationDB.

I wouldn't not use Postgres because of this problem, just like I wouldn't not use MongoDB because they had bugs in a new feature. In fact, I'm more likely to trust a company that is paying to consistently have their work reviewed in public.

1. https://jepsen.io/analyses/postgresql-12.3

jokethrowaway
5 replies
21h43m

Have you tried it in production? It's absolute mayhem.

Deadlocks were common; it uses a system of retries if the transaction fails; we had to disable transactions completely.

Next step is either writing a writer queue manually or migrating to postgres.

For now we fly without transaction and fix the occasional concurrency issues.

pipe_connector
3 replies
21h19m

Yes, I have worked on an application that pushed enormous volumes of data through MongoDB's transactions.

Deadlocks are an application issue. If you built your application the same way with Postgres you would have the same problem. Automatic retries of failed transactions with specific error codes are a driver feature you can tune or turn off if you'd like. The same is true for some Postgres drivers.

If you're seeing frequent deadlocks, your transactions are too large. If you model your data differently, deadlocks can be eliminated completely (and this advice applies regardless of the database you're using). I would recommend you engage a third party to review your data access patterns before you migrate and experience the same issues with Postgres.

akoboldfrying
2 replies
20h24m

Deadlocks are an application issue.

Not necessarily, and not in the very common single-writer-many-reader case. In that case, PostreSQL's MVCC allows all readers to see consistent snapshots of the data without blocking each other or the writer. TTBOMK, any other mechanism providing this guarantee requires locking (making deadlocks possible).

So: Does Mongo now also implement MVCC? (Last time I checked, it didn't.) If not, how does it guarantee that reads see consistent snapshots without blocking a writer?

pipe_connector
0 replies
19h23m

MongoDB (via WiredTiger) has used MVCC to solve this problem since transactions were introduced.

devit
0 replies
16h34m

Locking doesn't result in deadlocks, assuming that it's implemented properly.

If you know the set of locks ahead of time, just sort them by address and take them, which will always succeed with no deadlocks.

If the set of locks isn't known, then assign each transaction an increasing ID.

When trying to take a lock that is taken, then if the lock owner has higher ID signal it to terminate and retry after waiting for this transaction to terminate, and sleep waiting for it to release the lock.

Otherwise if it has lower ID abort the transaction, wait for the conflicting transaction to finish and then retry the transaction.

This guarantees that all transactions will terminate as long as each would terminate in isolation and that a transaction will retry at most once for each preceding running transaction.

It's also possible to detect deadlocks by keeping track of which thread every thread is waiting for and signaling the either the highest transaction ID in the cycle or the one the lowest ID is waiting for to abort, wait for ID it was waiting for terminate and retry.

threeseed
0 replies
18h38m

Next step is either writing a writer queue manually

You can just use a connection pool and limit writer threads.

You should be using one to manage your database connections regardless of which database you are using.

danpalmer
5 replies
18h16m

MongoDB had "strong consistency" back in 2013 when I studied it for my thesis. The problem is that consistency is a lot bigger space than being on or off, and MongoDB inhabited the lower classes of consistency for a long time while calling it strong consistency which lost a lot of developer trust. Postgres has a range of options, but the default is typically consistent enough to make most use-cases safe, whereas Mongo's default wasn't anywhere close.

They also had a big problem trading performance and consistency, to the point that for a long time (v1-2?) they ran in default-inconsistent mode to meet the numbers marketing was putting out. Postgres has never done this, partly because it doesn't have a marketing team, but again this lost a lot trust.

Lastly, even with the stronger end of their consistency guarantees, and as they have increased their guarantees, problems have been found again and again. It's common knowledge that it's better to find your own bugs than have your customers tell you about them, but in database consistency this is more true than normal. This is why FoundationDB are famous for having built a database testing setup before a database (somewhat true). It's clear from history that MongoDB don't have a sufficiently rigorous testing procedure.

All of these factors come down to trust: the community lacks trust in MongoDB because of repeated issues across a number of areas. As a result, just shipping "strong consistency" or something doesn't actually solve the root problem, that people don't want to use the product.

pipe_connector
1 replies
16h2m

It's fair to distrust something because you were burned by using it in the past. However, both the examples you named -- Postgres and FoundationDB -- have had similar concurrency and/or data loss bugs. I have personally seen FoundationDB lose a committed write. Writing databases is hard and it's easy to buy into marketing hype around safety.

I think you should reconsider your last paragraph. MongoDB has a massive community, and many large companies opt to use it for new applications every day. Many more people want to use that product than FoundationDB.

daniel-grigg
0 replies
15h23m

Can you elaborate on why ‘many large companies’ are choosing MongoDB over alternatives and what their use cases are? I’ve been using Mdb for a decade and with how rich the DB landscape is for optimising particular workloads I just don’t see what the value proposition is for Mdb is compared to most of them. I certainly wouldn’t use it for any data intensive application when there’s other fantastic OLAP dbs, nor some battle hardened distributed nodes use case, so that leaves a ‘general purpose db with very specific queries and limited indexes’. But then why not just use as PG as others say?

nijave
1 replies
17h8m

Have you looked at versions in the last couple years to see if they've made progress?

danpalmer
0 replies
16h39m

This kinda misses my point. By having poor defaults in the past, marketing claims at-odds with reality, and being repeatedly found to have bugs that reduce consistency, the result is that customer have no reason to trust current claims.

They may have fixed everything, but the only way to know that is to use it and see (because the issue was trusting marketing/docs/promises), and why should people put that time in when they've repeatedly got it wrong, especially when there are options that are just better now.

throwaway2037
0 replies
14h42m

    > my thesis
Can you share a link? I would like to read your research.

throwup238
0 replies
1d2h

> I'm not sure what "with strong consistency benefits" means.

"Doesn't use MongoDB" was my first thought.

lkdfjlkdfjlg
0 replies
1d2h

Pongo - Mongo but on Postgres and with strong consistency benefits.

I don't read this as saying it's "MongoDB but with...". I read it as saying that it's Postgres.

Izkata
0 replies
1d2h

MongoDB has supported the equivalent of Postgres' serializable isolation for many years now.

That would be the "I" in ACID

I'm not sure what "with strong consistency benefits" means.

Probably the "C" in ACID: Data integrity, such as constraints and foreign keys.

https://www.bmc.com/blogs/acid-atomic-consistent-isolated-du...

hdhshdhshdjd
14 replies
1d

I use JSONB columns a lot, it has its place. It can fit certain applications, but it does introduce a lot of extra query complexity and you lose out on some ways to speed up query performance that you could get from a relational approach.

Which is to say JSONB is useful, but I wouldn’t throw the relational baby out with the bath water.

doctor_eval
5 replies
20h59m

I’ve been doing some reasonably serious playing with the idea of using jsonb columns as a kind of front end to relational tables. So basically, external interactions with the database are done using JSON, which gives end users some flexibility, but internally we effectively create a realtime materialised view of just those properties we need from the json.

Anyone else tried this approach? Anything I should know about it?

Zenzero
3 replies
20h10m

My mental model has always been to only use JSONB for column types where the relations within the json object are of no importance to the DB. An example might be text editor markup. I imagine if you start wanting to query within the json object you should consider a more relational model.

mbesto
1 replies
16h5m

I always thought JSONB is a great use case for client-specific customizations. For example - a CRM application where you have structured DB tables like "account", "opportunity", etc. but then a client-specific table might be "cohort" that is linked to "account" and has fields "cohort size", "cohort name", etc. You just chuck the cohort table in JSONB and now the company who uses the CRM that has the custom cohort table can change that table without creating/migrating your RDMS table structure.

Curious what other people think here?

Zenzero
0 replies
15h49m

That sounds like something that would be better tailored to well normalized tables and custom row entries. The question I'd ask is if something you're concerned about being a column could instead be a row in a different table.

doctor_eval
0 replies
17h12m

My experience has been that users like to include a lot of information that's not relevant to the application I'm working on.

My application couldn't really care less about customer names, for example. But the people who buy my software naturally do care - and what's worse, each of my potential customers has some legacy system which stores their customer names in a different way. So one problem I want to address is, how do I maintain fidelity with the old system, for example during the data migration, while enabling me to move forward quickly?

My solution has been to keep non-functional data such as customer names in JSON, and extract only the two or three fields that are relevant to my application, and put them into a regular SQL database table.

So far this has given me the best of both worlds: a simple and highly customisable JSON API for these user-facing objects with mutable shapes, but a compact SQL backend for the actual work.

hdhshdhshdjd
0 replies
20h30m

I do something similar, building a lightweight search index over very large relational datasets.

So the tables are much simpler to manage, much more portable, so I can serve search off scalable hardware without disturbing the underlying source of truth.

The downside is queries are more complex and slower.

martinald
3 replies
19h0m

I've had extremely disappointing performance with postgres JSONB columns. I've used it a lot for MVPs and betas where it's much easier to use JSONB and lock in your exact relational schema lately.

I've now decided this path is a mistake because the performance is so bad. Even with low thousands/tens of thousands of rows it becomes a huge problem, queries that would take <1ms on relational stuff quickly start taking hundreds of ms.

Optimizing these with hand rolled queries is painful (I do not like the syntax it uses for jsonb querying) and for some doesn't really fix anything much, and indexes often don't help.

It seems that jsonb is just many many order of magnitudes slower, but I could be doing something wrong. Take for example storing a dictionary of string and int (number?) in a jsonb column. Adding the ints up in jsonb takes thousands of times longer rather than having these as string and int in a standard table.

Perhaps I am doing something wrong; and I'd love to know it if I am!

ak217
1 replies
16h23m

Did you have arrays in your jsonb data? I don't currently use jsonb for production performance sensitive queries, but in the past what I learned was that it's great at automatically indexing everything except arrays. You have to manually declare indexes on your arrays.

When jsonb works, it's incredible. I've had many... suboptimal experiences with mongo, and jsonb is just superior in my experience (although like I said, I haven't used it for performance critical stuff in production). For a long time, it kinda flew under the radar, and still remains an underappreciated feature of Postgres.

phamilton
0 replies
12h33m

indexes often don't help.

An "index on expression" should perform the same regardless of the input column types. All that matters is the output of the expression. Were you just indexing the whole jsonb column or were you indexing a specific expression?

For example, an index on `foo(user_id)` vs `foo(data->'user_id')` should perform the same.

cypherpunks01
1 replies
14h42m

Can a Generated Column read data from a JSONB column? That'd be really cool, but I'm not familiar enough with generated columns to know.

jamescrowley
0 replies
13h6m

Yep!

noisy_boy
0 replies
15h5m

I think if a field in a JSONB column is used frequently in the most common query patterns, it might be time to "surface" it up among other non-JSONB relational columns. There will be some additional overhead e.g. it should be made a read-only column for consistency's sake; if you update the value of this field in the JSONB column, this surfaced column should also be updated.

However, a bit of duplication is not a terrible trade-off for significantly improved query performance.

oskar_dudycz
3 replies
1d1h

Yes, I'm using MongoDB API in Pongo to keep the muscle memory. So, it's a library that translates the MongoDB syntax to native PostgreSQL JSONB queries.

remram
2 replies
18h19m

"how does it compare?" "yes"

Are you sure you replied to the right comment?

oskar_dudycz
1 replies
11h34m

Yes to "The posted project looks like a client that connects to pg but behaves like Mongo, where Ferret is a server that accepts Mongo client connections and uses pg as backend storage."

remram
0 replies
4h21m

Yeah, FYI that's a sibling comment, not the one you replied to.

aleksi
3 replies
1d2h

(I'm FerretDB co-founder)

As far as I can tell, Pongo provides an API similar to the MongoDB driver for Node that uses PostgreSQL under the hood. FerretDB operates on a different layer – it implements MongoDB network protocol, allowing it to work with any drivers and applications that use MongoDB without modifications.

Keyframe
1 replies
1d

Even monstache?

Sytten
0 replies
14h49m

I dont want to sound rude, but as a bootstrap founder it kinda boggles my mind how much money people can raise for a product like ferretdb. I just don't see how it can make VC level return without at the very least changing licenses which seems to ne the premise behind creating this MongoDB proxy. I am sure there is a narrative for it though so best of luck!

Also check you managed service links on GitHub, half are dead.

Zambyte
0 replies
1d2h

The posted project looks like a client that connects to pg but behaves like Mongo, where Ferret is a server that accepts Mongo client connections and uses pg as backend storage.

zulban
7 replies
1d3h

Neat. When I migrated a project from mongo to postgres I took a similar approach, except I only implemented the mongo feel I needed within my own project instead of building a proper library as done here. I was surprised how much performance improved despite using a hacky wrapper.

https://blog.stuartspence.ca/2023-05-goodbye-mongo.html

Personally tho, I plan to just drop all similarity to mongo in future projects.

jrochkind1
2 replies
18h59m

I was surprised how much performance improved

Are you saying you got better performance from postgres jsonb than from mongodb itself?

zo1
1 replies
10h27m

From the article, they mention this alongside a neat graph: "API calls generally take 8 ms now, not 150 ms."

His endpoints went from 150ms (with Mongo) to 8ms after moving to Postgres.

jrochkind1
0 replies
4h22m

Thanks. How embarressing for mongo, woah.

throwaway2037
1 replies
14h36m

Cripes, this is a good blog post. Did it get posted to HN for discussion? If not, it should have been. I especially like this part:

    > I'm trying to stay humble. Mongo must be an incredibly big project with lots of nuance. I'm just a solo developer and absolutely not a database engineer. I've also never had the opportunity to work closely with a good database engineer. However I shouldn't be seeing improvements like this with default out of the box PostgreSQL compared to all the things I tried over the years to fix and tune Mongo.
Humbleness: That is rare to see around here. It is impressive that you go such speed-ups for your use case. Congrats and thank you to share with the blog post.

EDIT

The Morgan Freeman meme at the end gave me a real laugh. I would say the same about my experience with GridGain ("GridPain").

dayjaby
0 replies
12h12m

I knew it was time to migrate when my non-programmer wife was becoming familiar with Mongo shortcomings

That part was so relatable

oskar_dudycz
0 replies
22h54m

Yup, I might not reach full compliance, but I will try to follow the Pareto principle. Thanks for the link and kind feedback!

Thaxll
0 replies
18h45m

I remember your post back then and it did not made sense at all, many pointed out it was lacking information and you probably did something wrong with mongo.

All the stuff under Mongo Problems is garbage, sorry.

salomonk_mur
7 replies
1d3h

What would be the advantage of using this instead of simple jsonb columns?

joshmanders
3 replies
1d3h

It uses JSONb under the hood. Just gives you a very "mongo" feel to using PostgreSQL. Not sure how I feel about it.

    CREATE TABLE IF NOT EXISTS %I (_id UUID PRIMARY KEY, data JSONB)

wood_spirit
2 replies
1d3h

Can they make it use uuid7 for ids for better insert_becomes_append performance?

lgas
1 replies
1d2h

Yes

oskar_dudycz
0 replies
1d1h

Yes, I'm using JSONB underneath and translating the MongoDB syntax to native queries. As they're not super pleasant to deal with, then I thought that it'd be nice to use some familiar to many MongoDB API.

Regarding IDs, you can use any UUID-compliant format.

lopatin
1 replies
1d1h

jsonb isn't web scale. Mongo is web scale.

digger495
0 replies
1d

I see what you did there

imnotjames
0 replies
1d3h

Looks like it natches the mongo node API

harel
7 replies
20h49m

I regularly find the hybrid model is a sweet spot. I keep core fields as regular columns and dynamic data structures as JSONB. It brings the best of both worlds together.

Waterluvian
5 replies
19h58m

I do this too with Postgres and it is just the best of both.

A robot is a record. A sensor calibration is a record. A warehouse robot map with tens of thousands of geojson objects is a single record.

If I made every map entity its own record, my database would be 10_000x more records and I’d get no value out of it. We’re not doing spatial relational queries.

hobs
4 replies
16h4m

It's great when you have no reason EVER to decompose the data.

That being said, when you start going "wait why is one record like this? oh no we have a bug and have to fix one of the records that looks like this across all data" and now you get to update 10,000x the data to make one change.

harel
3 replies
9h42m

Small price to pay in my opinion. How often will that happen vs how often the database is used. Migrations like that can be done incrementally over time. It's a solved problem.

Waterluvian
2 replies
6h27m

It’s also trivial to do. My JSON fields are all backed by JSON Schema. And I just write a data migration that mutates the data in some way and have the migration run by one host in a rate limited manner. It’s not quite as good as a traditional change in schema but it’s such a non-issue.

hobs
1 replies
4h48m

I am glad it works! I have just been subject to several systems that have grown over time that worked very well until it became a problem (and then a huge one) so I am glad you are taking a disciplined approach.

Waterluvian
0 replies
4h34m

Yup you’re absolutely right. There is no one size fits all. The more you can plan, and the more clear your case is, the less you need to pay for having flexibility.

webprofusion
0 replies
16h9m

Yeah we do this as well. Have previously been a heavy MongoDB user but when we migrated to Postgres (using some relational schema with some JSONB for nested objects etc) it just made querying so much easier and reliability has been great.

willsmith72
4 replies
19h43m

It's technologically cool, but I would love a "why" section in the README. Is the idea you're a mongo Dev/love the mongo api and want to use it rather than switch to pg apis? Or want to copy some code over from an old project?

I'm sure there are use cases, I'm just struggling to grasp them. Especially if it's about reusing queries from other projects, AI is pretty good at that

willsmith72
0 replies
5h42m

nice!

megadal
1 replies
16h25m

AI is pretty good at that

Good at what? Rewriting the queries?

I think the point of Pongo is you can use the exact same queries for the most part and just change backends.

I've worked a job in the past where this would have been useful (they chose Mongo and regretted it).

willsmith72
0 replies
12h29m

For sure, but it feels really risky. If it's a small codebase, I would be more confident knowing what queries I was using and just switch them. If it's a large codebase, I'd want some really comprehensive test coverage, including performance tests

rework
4 replies
23h0m

Looks sort of like MartenDB but trying to minic mongo api, unsure why anyone would want to do that... mongo api is horrible...

JanSt
3 replies
22h58m

Wouldn't that allow to switch from Mongo to Postgres without having to rewrite all of your app?

oskar_dudycz
2 replies
22h55m

Hint: I'm an ex-Marten maintainer, so the similarity is not accidental ;)

As Op said, not needing to rewrite applications or using the muscle memory from using Mongo is beneficial. I'm not planning to be strict and support only MongoDB API; I will extend it when needed (e.g. to support raw SQL or JSON Path). But I plan to keep shim with compliant API for the above reasons.

MongoDB API has its quirks but is also pretty powerful and widely used.

rework
1 replies
22h34m

Oh, so you are, then we can rest assured this will end up being a solid project!

I personally can't stand mongodb, its given me alot of headaches, joined a company and the same week I joined we lost a ton of data and the twat who set it up resigned in the middle of the outage. Got it back online and spend 6m moving to postgresql.

oskar_dudycz
0 replies
10h33m

Thanks, that's the goal: to bring the solid and verified approach in Marten to Node.js land. The concept is similar, but the feature set will be different.

cpursley
0 replies
4h49m

This is really nice. Your project?

pmarreck
0 replies
16h5m

“Scalling” should say “Scaling”

Nice list!

oskar_dudycz
0 replies
22h55m

Thank you!

cyberpunk
3 replies
23h1m

And it only costs 75k a seat per year per developer, with free bi yearly license compliance audits, a million in ops and hardware to get near prod and all the docu is paywalled. What a deal!

slau
1 replies
21h14m

A client had a DB hosted by Oracle. The client was doing most of their compute on AWS, and wanted to have a synchronised copy made available to them on AWS. Oracle quoted them a cool $600k/year to operate that copy, with a 3 year contract.

DMS + Postgres did it for $5k/year.

cyberpunk
0 replies
12h28m

Client of mine wanted to shift a rac cluster from some aging sparc gear into a VMware or openstack or whatever farm they had on premise; oracle demanded they pay CPU licenses for every single CPU in the cluster as each one could “potentially” run the oracle database, quoted them seven figures.

They rewrote the app instead.

sberder
3 replies
9h32m

This looks great, I'll definitely give it a try. As many mentioned already, having classic columns and a JSON(B) column seems to be a common solution. How do you handle data validation for the JSON documents? My current project uses Django for metadata. I've been thinking about creating a layer similar to model fields in Django. You would declare a JSON "model" through those fields and assign it to the actual model JSON field.

throwaway76324
1 replies
8h51m

You can just specify a model/DTO object and serialize it as JSON when saving. Many frameworks do that automatically so you don't need to think about it. At work we just annotate the field in the model as a json-field, and the framework will handle the json-conversion automatically and store the other fields in the model as regular database columns.

pseudo code (to not trigger language wars):

   class Foo {
   
       @Id
       UUID id;

       String name;

       @Json
       MyCustomModel model;
   }
Adding fields is not an issue, as it will simply be missing a value when de-serializing. Your business logic will need to handle its absence, but that is no different than using MongoDB or "classic" table columns

sberder
0 replies
3h40m

That's a very low cost approach, I love it! I still think the Django ecosystem would benefit from a standardized/packaged approach including migrations. I'll ponder a bit more

oskar_dudycz
0 replies
8h23m

Thank you! I'm planning to add support to JSON schema and run the validation upon insert/update operation.

frithsun
3 replies
15h9m

Programmers would be better served by learning nothing except SQL instead of their current strategy of trying to learn everything except SQL.

dboreham
1 replies
3h33m

They should learn about b-trees and how indexed queries can be done with them either with or without an explicit query language. Then they can decide what kind of data storage service they need. Understand what's happening inside the black box.

cryptonector
0 replies
1h49m

Yes, for sure, though I'd still start with SQL.

dudeinjapan
0 replies
12h0m

While I’d agree that understanding SQL basics is an important fundamental for novices to learn, I started using MongoDB 11 years ago and haven’t looked back.

posix_monad
2 replies
1d1h

Does MongoDB have serious market share compared to DynamoDB (and similar clones from Azure, GCP) at this point?

maxdo
0 replies
18h24m

Mongodb and dynamodb are completely different dbs. One is unlimited scale KV but very expensive , another is document nosql db that sells you idea “it just works” for lots of features , indexes on anything , aggregation , time series . Vector DB, sharding , replicas etc . It’s a very powerful db for sure.

ilius2
2 replies
11h55m

If I were to start a new project, I would directly use postgres, and possibly add a JSONB column ONLY FOR OPTIONAL fields that you don't query frequently. Throwing everything in a document is just fermenting chaos and pain. That being said, I do love the syntax and structure of Mongo pipelines over SQL.

throwaway76324
0 replies
10h13m

At $WORK, we use the same approach for integrations with 3rd party systems.

The data that is common for all integrations are stored as columns in a relational table. Data that are specific for each integration are stored in JSONB. This is typically meta data used to manage each integration that varies.

It works great and you get the combination of relational safety and no-schema flexibility where it matters.

stanislavb
0 replies
11h35m

"Throwing everything in a document is just fermenting chaos and pain." - I LOVE THIS.

revskill
1 replies
1d

Genius.

oskar_dudycz
0 replies
22h52m

<3

ramchip
1 replies
1d2h

Have you tried it with CockroachDB?

oskar_dudycz
0 replies
1d1h

I did not, but I'm not using any fancy syntax so far besides JSONB operators. If it won't work, then I'm happy to adjust it to make it compliant.

navbryce
1 replies
18h34m

I made a joke tweet about this in Nov 2023 (even called it "Pongo"). This is definitely a just a funny coincidence, but I'm going to pretend like I can see into the future: https://x.com/navbryce/status/1720580136737894661

oskar_dudycz
0 replies
10h10m

"Great mind think alike" B-)

joeyagreco
1 replies
1d3h

Good work! I would like to see a section on the README outlining the benefits of Pongo

oskar_dudycz
0 replies
1d1h

Thanks, I'll try to cover that, good call!

oskar_dudycz
0 replies
10h32m

Yes, a similar idea, but I don't aim to be 100% MongoDB compliant or full replacement. My goal is to use as many of PostgreSQL features as possible. Having the library level as translation will allow more scenarios like, e.g. sharing connection and using PostgreSQL hosting.

aussieguy1234
1 replies
17h39m

I'd like to know why AWS went with Aurora DB for their DocumentDB backend. Did the Mongo license change trigger a rush to build something Mongo compatible, but not quite MongoDB?

a13n
1 replies
18h23m

Does Pongo work with mongoose? I would guess most mongo users are using mongoose and supporting that library would drive more adoption.

Tao3300
1 replies
22h12m

Ditch the dalmatian before Disney rips your face off.

oskar_dudycz
0 replies
10h36m

Fair, I'll redraw it ;)

DonnyV
1 replies
15h32m

Would love a C# version of this. I usually use Mongodb for all of our projects. But we need to use Postgres for a project. This would come in very handy.

oskar_dudycz
0 replies
11h32m

You can check Marten, that I was co-maintaining: https://martendb.io/. It doesn't have MongoDB-compliant API, but it's mature, stable and efficient.

vmfunction
0 replies
11h7m

Hmmm how does this compare to https://www.ferretdb.com ?

How does this handle large files? Is it enough to replace GridFS? One of main attraction for MongoDb is it's handling of large files.

tracker1
0 replies
3h2m

Has this been tested with CockroachDB or any other databases that use a mostly compatible PostgreSQL wire protocol and query language?