I miss that direct connection. The fast feedback. The lack of making grand plans.
There's no date on this article, but it feels "prior to the MongoDB-is-webscale memes" and thus slightly outdated?
But, hey, I get where they're coming from. Personally, I used to be very much schema-first, make sure the data makes sense before even thinking about coding. Carefully deciding whether to use an INT data type where a BYTE would do.
Then, it turned out that large swathes of my beautiful, perfect schemas remained unoccupied, while some clusters were heavily abused to store completely unrelated stuff.
These days, my go-to solution is SQLite with two fields (well, three, if you count the implicit ROWID, which is invaluable for paging!): ID and Data, the latter being a JSONB blob.
Then, some indexes specified by `json_extract` expressions, some clever NULL coalescing in the consuming code, resulting in a generally-better experience than before...
Really!? Are you building applications by chance or something else? Are you doing raw sql mostly or an ORM/ORM-like library? This surprises me because my experience dabbling in json fields for CRUD apps has been mostly trouble stemming from the lack of typechecks. SQLite's fluid type system haa been a nice middle ground for me personally. For reference my application layer is kysely/typescript.
Well, you move the type checks from the database to the app, effectively, which is not a new idea by any means (and a bad idea in many cases), but with JSON, it can actually work out nicely-ish, as long as there are no significant relationships between tables.
Practical example: I recently wrote my own SMTP server (bad idea!), mostly to be able to control spam (even worse idea! don't listen to me!). Initially, I thought I would be really interested in remote IPs, reverse DNS domains, and whatever was claimed in the (E)HLO.
So, I designed my initial database around those concepts. Turns out, after like half a million session records: I'm much more interested in things like the Azure tenant ID, the Google 'groups' ID, the HTML body tag fingerprint, and other data points.
Fortunately, my session database is just 'JSON(B) in a single table', so I was able to add those additional fields without the need for any migrations. And SQLite's `json_extract` makes adding indexes after-the-fact super-easy.
Of course, these additional fields need to be explicitly nullable, and I need to skip processing based on them if they're absent, but fortunately modern C# makes that easy as well.
And, no, no need for an ORM, except `JsonSerializer.Deserialize<T>`... (And yeah, all of this is just a horrible hack, but one that seems surprisingly resilient so far, but YMMV)
That way you're throwing away 50% of the reason you use a relational database in the first place. Has it occurred to you that MongoDB exists?
Also I don't understand why you're afraid of migrations, especially since you're the only developer on your own SMTP server.
> That way you're throwing away 50% of the reason you use a relational database in the first place. Has it occurred to you that MongoDB exists?
Did you miss that he’s using sqlite? The dev experience with a sqlitedb is way better than running yet another service, especially for personal projects.
Sqlite is used just as much as an application file format as it is a relational database.
How does MongoDB handle someone pulling the power cord out of the server? Because that’s another reason to use something like SQLite, and it often gets used in embedded systems.
> Has it occurred to you that MongoDB exists?
What gain would MongoDB offer here?
You certainly would lose a lot of things, like a well supported path to linking with to the database engine, and a straightforward way to start to introduce relational tables as the project matures. Nothing completely insurmountable, of course, but carry a lot of extra effort for what benefit?
My original comment started with "but it feels "prior to the MongoDB-is-webscale memes""
So, care to take another guess? And, while we're here, does MongoDB run fully in-process these days? And/or allow easy pagination by ROWID?
> And, no, no need for an ORM, except `JsonSerializer.Deserialize<T>`... (And yeah, all of this is just a horrible hack, but one that seems surprisingly resilient so far, but YMMV)
I do the same thing with serde_json in Rust for a desktop app sqlitedb and it works great so +1 on that technique.
In Rust you can also tell serde to ignore unknown fields and use individual view structs to deserialize part of the JSON instead of the whole thing and use string references to make it zero copy.
why is a migration such a burden in that scenario
Our solution for a similar situation involving semi-structured data (in postgres) was to double it up: put all the json we send/receive with a vendor into a json field, then anything we actually need to work on gets extracted into regular table/columns. We get all the safety/performance guarantees the database would normally give us, plus historical data for debugging or to extract into a new column if we now need it. The one thing we had to monitor in code reviews was to never use the json field directly for functionality.
I actually love your approach and haven’t thought of that before. My problem with relational databases often stems from the fact that remodeling data types and schemas (which you often do as you build an application, whether or not you thought of a great schema beforehand) often comes with a lot of migration effort.
Pairing your approach with a „version“ field where you can check which version of a schema this rows data is saved with would actually allow you to be incredibly flexible with saving your data while also being able to be (somewhat) sure that your fields schema matches what you’re expecting.
Having to write and perform migrations for every small schema change is a bore, but it means your software doesn't have to worry about handling different versions of data. Going "schemaless" with version numbers means moving code from "write-and-forget" migrations to the main codebase, where it will live forever.
I think not doing database migrations only makes sense when you can make do without version numbers (or if you can't do atomic migrations due to performance constraints, but that's only a problem for a very small number of projects).
Is that "not" at the front supposed to be there?
Thanks, edited.
You’re correct there. I mostly work on CMSes with page builder functionality, which often bake the content schema into the database columns, which makes changing that schema (for new frontend features or reworking old ones) difficult and often prone to losing content, especially in dev environments. Best case is obviously that you never have to version your changes, but I‘d prefer making a new schema and writing an adapter function in the codebase depending on the schemas version to spending a lot of time migrating old content. That might just be due to me not being too comfortable with SQL and databases generally.
This is not my experience, it only happens rarely. I’d like to see an analysis of what causes schema changes that require nontrivial migrations.
Same here. If your entities are modelled mostly correctly you really don't have to worry about migrations that much. It's a bit of a red herring and convenient "problem" pushed by the NoSQL camp.
On a relatively neat and well modelled DB, large migrations are usually when relationships change. E.g. One to many becomes a many to many.
Really the biggest hurdle is managing the change control to ensure it aligns with you application. But that's a big problem with NoSQL DB deployments too.
At this point I don't even want to hear what kind of crazy magic and "weird default and fallback" behavior the schema less NoSQL crowd employs. My pessimistic take is they just expose the DB onto GraphQL and make it front ends problem.
Oh good question on date essay was written -- put dates on your things on the internet people!
Internet Archive has a crawl from today but no earlier; which doesn't mean it can't be earlier of course. My guess is it was written recently though.
created 14 hours ago https://github.com/jimmyhmiller/jimmyhmiller.github.io/commi...
But clearly in retrospect. It sounds like some of the things I was doing in 2009.
It includes a reference to Backbone and Knockout JS, which were released in 2010, so presumably it was around that era. The database, though, was probably much older...
At least we know it wasn't written after today!
Falsehoods Programmers Believe…?
I’m still in the think hard about the schema camp. I like to rely on the database to enforce constraints.
Yeah, a good database is pretty damn handy.
Have you had the pleasure of blowing young minds by revealing that production-grade databases come with fully fledged authnz systems that you can just...use right out of the box?
Can you say more? I’m interested.
I guess they mean something like Postgres' row-level security:
https://www.postgresql.org/docs/current/ddl-rowsecurity.html
Databases have pretty robust access controls to limit (a sql user's) access to tables, schemas, etc. Basic controls like being able to read but not write, and more advanced situations like being able to access data through a view or stored procedure without having direct access to the underlying tables.
Those features aren't used often in modern app development where one app owns the database and any external access is routed through an API. They were much more commonly used in old school apps enterprise apps where many different teams and apps would all directly access a single db.
Byte vs. Int is premature optimization. But indexing, primary keys, join tables, normalization vs. denormalization, etc. are all important.
I think you can only judge that by knowing the context, like the domain and the experience of the designer/dev within that domain.
I looked at a DB once and thought "why are you spending effort to create these datatypes that use less storage, I'm used to just using an int and moving on."
Then I looked at the volumes of transactions they were dealing with and I understood why.
This is perfectly fine when you are driving some app that has a per-user experience that allows you to wrap up most of their experience in some blobs.
However I would still advise people to use a third normal form - they help you, constraints help you, and often other sets of tooling have poor support for constraints on JSON. Scanning and updating every value because you need to update some subset sucks.
You first point is super valid though - understanding the domain is very useful and you can get easily 10x the performance by designing with proper types involved, but importantly don't just build out the model before devs and customers have a use for anything, this is a classic mistake in my eyes (and then skipping cleanup when that is basically unused.)
If you want to figure out your data model in depth beforehand there's nothing wrong with that... but you will still make tons of mistakes mistakes, lack of planning will require last minute fixes, and the evolution of the product will have your original planning gather dust.
Mirrors my experience exactly. Querying json can get complex to get info from the db. SQLite is kind of forgiving because sequences of queries (I mean query, modify in appliation code that fully supports json ie js, then query again) are less painful meaning it's less moprtant to do everytning in the database for performance reasons. But if you're trying to do everything in 1 query, I think you pay for it at application-writing time over and over.
So, you're basically running DynamoDB on top of a sql server?
This is essentially just a data warehousing style schema. I love me a narrow db table.
But I do try and get a schema that fits the business if I can.
I think they are referring to the fact that software development as a field has matured a lot and there are established practices and experienced developers all over who have been in those situations, so generally, these days, you don't see such code bases anymore.
That is how I read it.
Another possible reason you don't see those code bases anymore is the fact that such teams/companies don't have a competitive comp, so there are mostly junior devs or people who can't get a job at a more competent team that get hired in those places