I like the idea of using select/put/delete (sql-style syntax) to query non-rdb data storage. It sort of raises the question of, could there be 1 universal language to query relational databases, text file storage (json, csv, etc), and anything else.
Or put another way, is there any data storage format that couldn’t be queried by SQL?
There are a lot of differences between storage formats. It would be incredibly difficult to create a universal query language. It would need to either a) change the storage formats so much that they're not really following their original standard, or b) create so many different versions of the query language that it's not really one standard.
Off the top of my head, SQL can't do lists as values, and doesn't have simple key-value storage. Json doesn't have tables, or primary keys / foreign keys, and can have nested data
SQL has both standard JSON and Array functions. What's the "list as value" feature you think is missing?
Yes, but more awkward syntax than a dedicate tool, and disparity between JSON and non-JSON types.
Sure there could be -- any turing-complete language (which SQL is) can query anything.
But the reason we have different programming languages* is because they have different affordances and make it easy to express certain things at the cost of being less convenient for other things. Thus APL/Prolog/Lisp/C/Python can all coexist.
SQL is great for relational databases, but it's like commuting to work in a tank when it comes to key-value stores.
* and of course because programmers love building tools, and a language is the ultimate tool.
sounds like a nightmare to do logistically. it would be cool though.
XML attributes come to mind
Perz1val, it's me, your grandchild from the distant future. Don't do this. XML goes rogue and destroys humanity.
Even if SQL and/or another query language could be Turing-complete, that doesn't mean that you can have 1 universal language to perform all possible queries in an efficient way. In basic computer science terms that means that your data structure is linked with the queries, and efficiency you want to achieve, and ad-hoc changes should be created for specific problems.
SAS is good at reading pretty much anything.
Depends on how keen you are on pure SQL. For example, postgres and sqlite have json-extensions, but they also enhance the syntax for it. Simliar can be done for all other formats too, but this means you need to learn special syntax and be aware of the storage-format for every query. This is far off from a real universal language.
All data is a graph, and so graph query languages work well for this. SPARQL is my tool of choice but cypher and maybe GQL (though that's new).
Tree and graph structures can be queried using SQL (with more or less difficulty depending on how you have chosen to encode and index them), but it's not a particularly simple and straightforward language to use for such a task.
We created PLDB.io (a Programming Language DataBase) and have studied nearly every language ever created and thought about this question a lot.
Yes, there could be 1 language to query everything, but there will always be a better DSL more relevant for particular kinds of data than others. It's sort of like how with a magnifying glass you can magnify anything, but if you want to look at bacteria you're going to want a microscope (and you wouldn't want a microscope to study an elephant).
Now it may turn out that there is 1 universal syntax that works best for everything (I'm sure people can guess what I would say), but I can't think of a case where you wouldn't want to have a DSL with semantics evolved to match a particular domain.
That's basically SQL. Many SQL systems have lots of built in connectivity to various data sources.
DuckDB is a good example of a (literally) serverless SQL-based tool for data processing. It is designed to be able to treat the common data serialization formats as though they are tables in a schema [1], and you can export to many of the same formats. With extensions, you can also connect to relational databases as foreign tables.
This connectivity is a big reason it has built a pretty avid following in the data science world.
[1] https://duckdb.org/docs/data/overview
[2] https://duckdb.org/docs/extensions/json#json-importexport
[3] https://duckdb.org/docs/extensions/postgres
If entries can be relations themselves it is not possible afaik. For example
Now there are academic operators to convert to and from a purely relational system, but I don't think they are implemented/in the standard. I forgot what they are called, however.In general you don't want a universal query language. Depending on the shape of the data you want different things to be easily expressible. You can, for example express queries on tree-shaped data with SQL (see xPath-Accelerator), but it is quite cumbersome and its meaning is lost to the reader. I.e.: It's fine when computer-generated, but there is too much noise for a human to read/write themselves. I'd be glad to be proven wrong here, but as time has shown, there is no one size fits all for programming languages. The requirements for different applications just vary too much.
Is your SQL Turing-complete? If yes, then it could query anything. Whether or not you'd like the experience is another thing.
Queries are programs. Querying data from a fixed schema, is easy. Hell, you could make an "universal query language" by just concatenating together this dasel, with SQL and Cypher, so you'd use the relevant facet when querying a specific data source. The real problem starts when your query structure isn't fixed - where what data you need depends on what the data says. When you're dealing with indirection. Once you start doing joins or conditionals or `foo[bar['baz']] if bar.hasProperty('baz') else 42` kind of indirection, you quickly land in the Turing tarpit[0] - whatever your query language is, some shapes of data will be super painful for it to deal with. Painful, but still possible.
--
[0] - https://en.wikipedia.org/wiki/Turing_tarpit
From what I understand SQL is or at least can be made Turing complete so in that sense you should be able to query any data store using it. However, that doesn’t mean it will be efficient to do so.
I suspect for most data structures you could construct an index to make querying faster. But think about querying something like a linked list: it is not going to be too efficient without an index but you should still be able to write an engine that will do so.
If you have something like a collection of arbitrary JSON objects without a set structure you should still be able to express what you are trying to do with SQL because Turing completeness means it can examine the object structure as well as contents before deciding what to do with it. But your SQL would look more like procedural code than you might be used to.
Perhaps the limit is not around formats but around the type system. You may be able to dump data, but can you actually reliably use it for anything?