Back in the day (early 2000’s), I worked with a DB2 instance that had similar functionality. At the time, they called this feature federated databases. If you had the appropriate wrapper, you could use any data source in a query. Even output from other programs. At the time I used it for including dynamic DNA sequence alignments in queries.
IIRC, SQLite can do similar things with virtual tables (with a more limited set of data file types).
I always liked this way of working, but I also wonder why it never really took off. Data discovery can be an issue, and I can see the lack of indexing as being a problem.
I guess that’s a long winded way to ask: as interesting as this is, what are the use cases where one would really want (or need) to use it?
In today's new fangled world, a lot of developers don't use a lot of the great stuff that RDBMS can provide - stored procedures, SQL constraints, even indexes. The modern mindset seems to be that that stuff belongs in the code layer, above the database. Sometimes it's justified as keeping the database vanilla, so that it can be swapped out.
In the old days you aimed to keep your database consistent as far as possible, no matter what client was using it. So of course you would use SQL constraints, otherwise people could accidentally corrupt the database using SQL tools, or just with badly written application code.
So it's not hard to see why more esoteric functions are not widely used.
I've heard people call the old approach "outdated" and even "wrong" before.
It's honestly the dumbest situation, where people eagerly use extremely complex databases as if they were indexed KV stores. Completely ignoring about 97% of the features in the process.
What's especially funny is that half the time a basic KV store would perform better given all the nonsense layered on top.
And then there's this whole mentality of "we can't interface with this database unless we insert a translation layer which converts between relationships between sets of tuples and a graph structure".
It's like people have unearthed an ancient technology and have no idea how it's intended to be used.
Stored procs, triggers and suchlike actually are outdated and mostly wrong. Have you considered that people moved away from them for good reasons, not just ignorance? Hard to source control, impossible to debug, written in weird/bad languages, massive blast radius, don’t scale. Usability matters! I have written more TSQL than I would care to admit, and it’s a hell you couldn’t pay me enough to go back to.
The “consistency must be enforced at all costs” just turned out not to be true. Worked at many places at moderate scale since my old dba days that don’t use fks. It just doesn’t actually matter, I can’t recall any serious bugs or outages related to dangling rows. Plus, you end up having multiple databases anyway for scale and organizational reasons so fks are useless in that environment anyway.
On the other hand, I’m all for indexes and complex querying. not just k-v.
I think you touched on the real reason: we separate services now and they all have different databases. We shifted up to the application layer not because of technical reasons but social reasons.
Your timing is off, we as an industry ditched stored procs and friends before (micro)services became a trend. For the reasons I said.
I’m not sure why it’s so hard to accept that we stopped using them because they were bad, not for some other gotcha reason (social, ignorance, fads, etc)
Maybe because claiming they are bad with out explaining why, giving explicit examples, or really digging into the nuance of the more than one feature mentioned makes the claim feel unsubstantiated.
Pretty sure I did say why. “ Hard to source control, impossible to debug, written in weird/bad languages, massive blast radius, don’t scale. Usability matters! ” and I forgot “generates non obvious non local changes” for triggers.
What is so hard to source control with stored procedures, I’ve writing them in text files for over 30years, version controlled in rcs, cvs, svn and git. No problem, it’s just the deployment automation that works a bit different.
Also, oracle plsql is way better than ms t-sql, it mostly feels like pascal/modula2 with embedded sql statements.
yes, you can hack something together that keeps them in source control, and then some automations to run them in a CD way. But now your business logic change is half in application code, and half in sprocs, and you have a versioning and deployment coordination problem. Which you can work around too. But why? making deployments and debugging of buis logic hard for what end? Just keep all the business logic in your app code, then its deployed and tested as one artifact, and everyones happier.
Also i hope you're not suggesting anyone pick up oracle for a new project these days :)
What do you mean hack to keep them in source control? They're just sql source code files, like any other source code you can simply commit to version control?
Wouldn't advise anyone to use Oracle, but neither would i advise them to use sql-server. Usually Postgres is good enough, although oracle plsql is still nicer than postgres plsql (e.g, packages).
i feel like i addressed this in a different subthread https://news.ycombinator.com/reply?id=40526231&goto=threads%...
Hard agree.
But I like foreign keys (ON DELETE RESTRICT), they are the last layer of defense against application bugs messing up the database.
i totally get that position. I used to be in that camp, but in my last 3 jobs at companies of >10mil users, we had to get rid of FKs for other reasons (perf, dist system stuff). I kind of expected bad data corruption issues.... but they just never happened ¯\_(ツ)_/¯. Not sure if that was strong cultures of automated testing, or what. So now i no longer think they're worth the tradeoffs.
YMMV with this one, i could see how it might pan out differently in other environments
Did these involve financial transactions? Other situations I will have dangling rows without issues, but whenever money comes into the picture I like to have as much consistency enforced as I can. But I'm aware that doesn't play nicely with distributed systems.
Yes actually. financial transactions are often used as an example that demands strong consistency. But it’s not true in the real world for the most part. for any given money movement (a debit and a credit) there’s a good chance one of those legs is over the network - you’re either interacting with a third party api, or a proprietary internal product, or similar. So you can’t have transactional consistency, or fks. Further, for every movement there is layers of reconciliation that happens hours or days later, that fixes any inconsistency. Finance / fintech is actually among the most eventually consistent systems out there, so it’s sort of ironic its use in a lot of transactional examples.
Exception is true banking ledgers maintained at banks, but even then you’d be surprised. Entirely different systems handle loans and checking accounts
Most of the actual drawbacks you describe (Lack of source control and "weird/bad language" are just wrong. You can source control this stuff just fine and if you don't like the language nothing stops you from transpiling.) are caused by how databases are being used not by their inherent design.
That's what I meant when I said it's weird people keep picking these particular databases for projects which don't end up using any of the features.
Stored procedures, triggers and all the other features people seem to refuse to use work mostly fine if you actually design the rest of the product around them and put a modicum of effort into them.
"It doesn't scale" - that can be said of the fundamental database design itself. If you need to scale hard then you need to pick a distributed database.
I dont really know why im arguing this, there are 100s of blog posts out there for decades that explain why sprocs are considered bad. The marketplace of ideas has spoken, and we've moved on. I'll never likely never convince you, but here goes my take expanded some more.
You can source control them, but then you have a deployment coordination problem. Often changes to business logic will be half in application code and half in the sproc, and you have to coordinate the deployment of that change. You have to somehow reason about what version of the sprocs are on your server. Add on that usually in enviroments that use sprocs, changes to them are gatekept by DBAs. Just write your business logic in application code.
Transpile? why on earth would anyone bother with that, it just makes it even more complicated and impossible to debug. without that, TSQL/PSQL/PGSQL are objectively very awkward languages for imperative business logic. People are not familiar with them, it makes it hard to jump from regular code to *SQL code to read the logic flow, nevermind debugging them with breakpoints and suchlike. Splitting your buis logic between database and app makes it much more awkward to test too. Just write your business logic in application code.
Scale - making the hardest to scale piece of your architecture , the traditional rdmbs, an execution environment for arbitrarily complex business logic is going to add a lot of extra load. Meaning you're going to have scaling problems sooner than you otherwise would. Just write your business logic in application code, you can scale those pods/servers horizontally easy peasy, they're cheap.
Look you can make any technology work if you try hard enough, but sprocs are almost all downsides these days. The one minor upside is some atomicness, but you can get that with a client side transaction and a few more round trips. There's just IMO no reason to pick them anymore and pay these costs, given what we've learned.
BTW I could have made this clearer in the original comment but i love RDBMSs - i love schemas, indexes, complex querying, and transactions. It is not weird to keep picking these databases to get those things.
I just will never again use them as a programming environment.
I think you put too much weight on what people currently do. Every year it seems the previous years trendy ideas are "terrible" and something new is "the way it should have always been done from the start".
You probably shouldn't be using stored procedures to be implementing more than you need to keep things internally consistent (i.e., almost never). I actually don't understand why you're so hung up on stored procedures. I've seen people try to implement whole applications on top of stored procedures, certainly not anything I'm recommending.
Yes if you split your business logic like that then you're going to have trouble, so don't. If instead you treat your database as an isolated unit and keep it internally consistent, then it's not really that difficult to keep things always working even when migrations happen.
As for the DBAs, nobody says you have to have DBAs, but if you're going to use a technology, it's worth having someone who understands it. (or, you know, just pick a different database).
I mean, lots of people use query generators which let you write e.g. native python and generate equivalent SQL (SQLAlchemy Core).
Anyway, your complaint seems to be that people don't understand SQL databases well enough to use them properly. Sure, but this is like complaining that git is bad because nobody knows how to use it. People who don't know or want to learn git, SQL, or a tool they're using, should pick a different tool.
well yes
sure
...
I think at some point you read something I wrote as: "And you should attempt to find out a way to shoehorn your entire application into the RDBMS such that your application is just a frontend which calls stored procedures on the RDBMS" but that's certainly not anything close to what I said.
Designing your database such that it stays internally consistent when you perform operations on it is a good goal, and doesn't require filling it with stored procedures and making it your backend.
I think the implementations had a bad DX - but the concepts were not incorrect. Not enforcing referential integrity, or foregoing schemas all together (hello document stores and then doing the equivalent of SQL joins and aggregations in nested loops within code) is just plain lazy / ignorant that comes to bite you in the ass later.
It makes me think that for many projects, recutils would be a better fit if you’re not gonna use the advanced functionality anyway. It should do just fine on medium sized DBs and is so much simpler.
Yep, that's me. I use sqlite as a key value store. Usually just two columns per table, a key and value, that I map to a python dict, and pickle the value if I need something more complex.
Sqlite is obscenely performant as Key Value store, though.
Implementing constraints in application code is certainly a lot easier (and easier to test) than in the database. What I want is a database which far stronger constraint capabilities than eg MySQL or Postgres provide, so that my application-level constraints can live in the database where they belong, without compromising ease of development and maintenance.
> Implementing constraints in application code is certainly a lot easier
I can’t imagine doing this. You’re basically saying that you will only ever have one program writing to your database.
This is, for many, considered a best practice.
The idea being that a single service/codebase is controlling interaction with the database. Any read/writes should go through that service's APIs. Basically a "CreateFoo(a, b, c)" API is a better way to enforce how data is written/read vs every service writing their own queries.
It incurs huge costs though:
1. Costs of API maintenance - rigid APIs need constant change. To mitigate that people create APIs like GraphQL which is a reimplementation of SQL (ie. a query language).
2. Costs of data integration - (micro)systems owning databases need data synchronisation and duplication. What's worse - most of the times data synchronisation bypasses service APIs (using tools like CDC etc.) so all efforts to decouple services via APIs are moot.
A single data platform (a database) with well governed structure enforced by a DBMS is a compelling alternative to microservices in this context.
Rigidity is the point...
If you have a "CRUD a Foo" set of APIs, and how you create/read a "foo" is defined in that API. Sometimes change is necessary, sometimes the API contract changes, but sometimes just the internal implementation changes (eg new or refactored tables/columns). The rigidity of the API ensures that every downstream user of a foo creates/reads in the exact same way. It centralizes permissions, overrides (eg. a/b testing), rate limiting, transactions, etc. to be homogeneous for everyone. If you want to create/read a foo via database queries alone, and the database changes or the business logic changes, then the same issue occurs where all clients need to change, but now it needs to be coordinated everywhere, and you can't benefit from a centralized change hiding implementation details.
Many people prefer to keep all the logic around enforcing consistency and object lifecycle (application behavior) in the application layer. This allows a single codebase to manage it, and it can be uniformly guarded with tests. Exposing the database itself is really just an example of a leaking implementation details.
If you need raw flexible queries, then this is probably the wrong sort of solution. Ideally, the developer of a service already knows what queries will be made, and clients don't typically need detailed custom queries. Analytics (typically read-only) should already occur in an offline or read-replica version of the database to not impact production traffic.
What do you mean? Foo service exposes "CRUD-Foo" apis and is the only service that calls the Foo-storing database. If Foo service is horizontally scaled, then it's all the same code, and can safely call foo-db in parallel. If the database needs horizontal scaling, you can just use whatever primitives exist for the database you picked, and foo-service will call it as expected, and the transaction governs the lifecycle of the records.
To be clear, different services wouldn't operate on their own duplicated version of the same shared data, in their own database. They'd call a defined API to get that data from another service. The whole point of this is to allow each service to define the interface for their data.
This is a mistake that a lot of people make - treating database as an implementation detail. Relational model and relational calculus/algebra were invented for _sharing_ data (not storing). The relational model _is_ the interface. Access to data is abstracted away - swapping storage layer or introducing different indexes or even using foreign data wrappers is transparent to applications.
Security and access policies are defined not against API operations but against _data_ - because it is data that you want to protect regardless of what API is used to access it.
You mean synchronous calls? This actually leads to what industry calls "distributed spaghetti" and is the worst of both worlds.
The point is that well defined relational database schema and SQL _is_ a very good interface for data. Wrapping it in JSON over HTTP is a step backwards.
That's the idea of microservices.
In PostgreSQL you can write pretty much any constraint you wish: https://www.postgresql.org/docs/current/ddl-constraints.html
IMHO stored procedures are evil and they shouldn’t have place in modern development. They are old solution for client server world.
I don’t get this mentality, they’re just code that lives closed to your data, and can do bulk processing on data in your database way more efficiently than anything living in the application layer.
Not necessarily. All depends on particulars. Very complex processing of reasonably sized chunk of data especially when can be done in multiple threads / distributed can be way more efficient.
I've head very real case: consulting company was hired by TELCO to write the code that will take content of the database and create one huge file containing invoices to clients. Said file would then be sent to a print house that can print the actual invoices, and mail those to clients.
They tried to do it with stored procedures and had failed miserably - it would run for a couple of days give or take and then crash.
I was called for help. Created 2 executables. On is a manager and the other is multithreaded calculator that did all the math. Calculator was put on 300 client care workstations as a service. Dispatcher would do initial query from accounts table for current bill cycle, split account id's into few arrays and send those arrays to calculators. Calculators would suck needed data in bulk from a database, do all the calculations and send the results back to dispatcher for merging into final file. TELCO people were shocked when they realized that they can have print file in less than 2 hours.
Sometimes you need to have atomic updates for a function. Stored procedures aren’t inherently evil, but are a hold over from when DB admins were a thing and they didn’t trust app developers.
Don’t worry, just wait another 5-10 years, they’ll be back in vogue again.
- References are triggers
- Audit tables are often implemented thanks to triggers
- Soft deletes can be easily implemented thanks to triggers
We live in a client-server world.
Stored procedures are awful. There, I said it.
Its storing transform code in an inscrutible way, hiding pipeline components from downstream users. The rest of your alarms and dq has to be extendend to look into these...
Oh an PL/SQL requires a huge context switch from whatever you were doing.
It’s not a controversial take. They are awful, and thus haven't been used for new development for at least a decade, going on two.
I generate my pl/pgsql using Lisp. Which is another technology that has been dying for decades. The money it generates seems to work just as well as the money I earned writing js.
I'm a bit surprised about the flak that stored procs have. Treat them just like any other, hopefully well documented, API, and they work perfectly fine. There are pros and cons, just like whether you decide to use react or htmx, python or golang, erlang or lisp, row based database vs column based vs kv, etc.
On the other hand, yes, going back to the days of poorly written and documented oracle pl/sql stored procs, yes, I shudder too, but then again, that can also be said of a number of public APIs that have been hacked/had side effects exposed.
People do still use indexes and the basic FK/UQ constraints because they're add value even when using ORM mappers,etc. I don't think that the ideas of changing databases are that prevalent any more (apart from perhaps unit-testing but since db's can often be dockerized that's also solved now).
People don't hate stored-procedures (or other custom SQL constraints/triggers) per-se, but more the amount of headache they bring when it comes to debugging and keeping in sync with version control.
.sql files go into version control just as well as .java files. And because your build will be blowing all your stored procedures away and rebuilding them as part of the build, you'll know at compile time if Junior Dev Jimmy dropped a null constraint or renamed something but didn't fix one of the procs.
That's way easier to debug than if Jimmy also had his query living as a string somewhere in your backend code.
Honestly, if this stuff is hard, it's because you've made it hard. I can only assume most people griping about stored procedures don't have a /sql folder living next to their /python or /java folder in source control.
Or… you just keep all your business logic in /java and call it a day. It’s stored procs that are making things harder.
I understand if it's a prototype, I have seen developers of even mature products follow this mentality that they want the DB to be portable. It's insane. The reality is that rarely happens and when it does its mostly cause your business is growing. But not knowing (or caring to know) the features your DB offers you and not exploiting it is just laziness.
the approach in the article is for analytics / DW type use cases, not OTLP. So a dig about kids these days not understanding constraints is not particularly relevant.
The reason a lot of more advanced database features aren't used is simply because a lot (most?) web developers don't know them and have a very light understanding of databases in general.
A related functionality is "SQL/MED", a SQL specification for federated databases that has some kind of relationship to medical data historically (I believe one of the use cases is data at one site that another site is allowed to query, in a limited way, data hosted at another site that may not be moved from it).
The MED stands for management of external data, nothing to do with medical data.
It’s actually a chapter in the ISO/ANSI SQL specification.
My understanding from reading the literature around SQL/MED and related technologies is that they were, indeed, inspired by data management problems in medicine. The use cases are like those I outlined -- we can't move the data from this physical location, but we want to allow other sites to query it.
As a result, a specification was added, as part of ISO/ANSI SQL specification process.
The MED term is an acronym and not a shortening of "medical".
Postgres calls these foreign data wrappers (FDW)
https://wiki.postgresql.org/wiki/Foreign_data_wrappers
Steampipe demonstrates a rather impressive range of scenarios for using FDWs + SQL in place of more traditional ETL and API integrations: https://steampipe.io/
I like this when I tried it out. The AWS API plugin was neat, query resources with SQL.
However, I like to limit my toolset to three things, these days thats duckdb, julia for analysis side, and... OK two things.
Oh yeah Trino for our distributed compute.
Similar timeline - Oracle had a feature called external tables where you can create a "table" on top of an external flat file and run queries against it. Prior to that we were use awk/perl etc and found it to be more performant + all the benefits that SQL brings.
sqlite is awesome, obviously. but the api isnt elegant and easy the way duckdb (or postgres) is. handling dates in sqlite has me googling everything.