HN comments for: Dasel: Select, put and delete data from JSON, TOML, YAML, XML and CSV

0thgen

18 replies

1d3h

2024-08-18 14:39:29 UTC

I like the idea of using select/put/delete (sql-style syntax) to query non-rdb data storage. It sort of raises the question of, could there be 1 universal language to query relational databases, text file storage (json, csv, etc), and anything else.

Or put another way, is there any data storage format that couldn’t be queried by SQL?

Derelicte

2 replies

1d3h

2024-08-18 15:09:12 UTC

There are a lot of differences between storage formats. It would be incredibly difficult to create a universal query language. It would need to either a) change the storage formats so much that they're not really following their original standard, or b) create so many different versions of the query language that it's not really one standard.

Off the top of my head, SQL can't do lists as values, and doesn't have simple key-value storage. Json doesn't have tables, or primary keys / foreign keys, and can have nested data

esprehn

1 replies

1d3h

2024-08-18 15:30:48 UTC

SQL has both standard JSON and Array functions. What's the "list as value" feature you think is missing?

paulddraper

0 replies

45m

2024-08-19 17:45:44 UTC

Yes, but more awkward syntax than a dedicate tool, and disparity between JSON and non-JSON types.

gumby

1 replies

1d1h

2024-08-18 17:22:11 UTC

It sort of raises the question of, could there be 1 universal language to query relational databases, text file storage (json, csv, etc), and anything else.

Sure there could be -- any turing-complete language (which SQL is) can query anything.

But the reason we have different programming languages* is because they have different affordances and make it easy to express certain things at the cost of being less convenient for other things. Thus APL/Prolog/Lisp/C/Python can all coexist.

SQL is great for relational databases, but it's like commuting to work in a tank when it comes to key-value stores.

* and of course because programmers love building tools, and a language is the ultimate tool.

sweeter

0 replies

2024-08-18 17:51:57 UTC

sounds like a nightmare to do logistically. it would be cool though.

Perz1val

1 replies

1d3h

2024-08-18 15:13:22 UTC

XML attributes come to mind

lagniappe

0 replies

1d3h

2024-08-18 15:22:11 UTC

Perz1val, it's me, your grandchild from the distant future. Don't do this. XML goes rogue and destroys humanity.

wslh

0 replies

21h44m

2024-08-18 20:46:48 UTC

It sort of raises the question of, could there be 1 universal language to query relational databases...

Even if SQL and/or another query language could be Turing-complete, that doesn't mean that you can have 1 universal language to perform all possible queries in an efficient way. In basic computer science terms that means that your data structure is linked with the queries, and efficiency you want to achieve, and ad-hoc changes should be created for specific problems.

username135

0 replies

15h16m

2024-08-19 03:14:51 UTC

SAS is good at reading pretty much anything.

slightwinder

0 replies

1d2h

2024-08-18 15:52:22 UTC

Or put another way, is there any data storage format that couldn’t be queried by SQL?

Depends on how keen you are on pure SQL. For example, postgres and sqlite have json-extensions, but they also enhance the syntax for it. Simliar can be done for all other formats too, but this means you need to learn special syntax and be aware of the storage-format for every query. This is far off from a real universal language.

robto

0 replies

3h26m

2024-08-19 15:05:04 UTC

All data is a graph, and so graph query languages work well for this. SPARQL is my tool of choice but cypher and maybe GQL (though that's new).

robmccoll

0 replies

5h17m

2024-08-19 13:13:47 UTC

Tree and graph structures can be queried using SQL (with more or less difficulty depending on how you have chosen to encode and index them), but it's not a particularly simple and straightforward language to use for such a task.

breck

0 replies

8h46m

2024-08-19 09:44:57 UTC

could there be 1 universal language to query relational databases, text file storage (json, csv, etc), and anything else.

Or put another way, is there any data storage format that couldn’t be queried by SQL?

We created PLDB.io (a Programming Language DataBase) and have studied nearly every language ever created and thought about this question a lot.

Yes, there could be 1 language to query everything, but there will always be a better DSL more relevant for particular kinds of data than others. It's sort of like how with a magnifying glass you can magnify anything, but if you want to look at bacteria you're going to want a microscope (and you wouldn't want a microscope to study an elephant).

Now it may turn out that there is 1 universal syntax that works best for everything (I'm sure people can guess what I would say), but I can't think of a case where you wouldn't want to have a DSL with semantics evolved to match a particular domain.

acjohnson55

0 replies

19h53m

2024-08-18 22:37:51 UTC

That's basically SQL. Many SQL systems have lots of built in connectivity to various data sources.

DuckDB is a good example of a (literally) serverless SQL-based tool for data processing. It is designed to be able to treat the common data serialization formats as though they are tables in a schema [1], and you can export to many of the same formats. With extensions, you can also connect to relational databases as foreign tables.

This connectivity is a big reason it has built a pretty avid following in the data science world.

[1] https://duckdb.org/docs/data/overview

[2] https://duckdb.org/docs/extensions/json#json-importexport

[3] https://duckdb.org/docs/extensions/postgres

ablob

0 replies

23h43m

2024-08-18 18:47:57 UTC

If entries can be relations themselves it is not possible afaik. For example

  User | Telephone Numbers
  -----+------------------
  A    | 123, 456           <- not atomic; more than 1 number (i.e. a set)
  B    | 789

Now there are academic operators to convert to and from a purely relational system, but I don't think they are implemented/in the standard. I forgot what they are called, however.

In general you don't want a universal query language. Depending on the shape of the data you want different things to be easily expressible. You can, for example express queries on tree-shaped data with SQL (see xPath-Accelerator), but it is quite cumbersome and its meaning is lost to the reader. I.e.: It's fine when computer-generated, but there is too much noise for a human to read/write themselves. I'd be glad to be proven wrong here, but as time has shown, there is no one size fits all for programming languages. The requirements for different applications just vary too much.

TeMPOraL

0 replies

1d2h

2024-08-18 15:52:29 UTC

Or put another way, is there any data storage format that couldn’t be queried by SQL?

Is your SQL Turing-complete? If yes, then it could query anything. Whether or not you'd like the experience is another thing.

Queries are programs. Querying data from a fixed schema, is easy. Hell, you could make an "universal query language" by just concatenating together this dasel, with SQL and Cypher, so you'd use the relevant facet when querying a specific data source. The real problem starts when your query structure isn't fixed - where what data you need depends on what the data says. When you're dealing with indirection. Once you start doing joins or conditionals or `foo[bar['baz']] if bar.hasProperty('baz') else 42` kind of indirection, you quickly land in the Turing tarpit[0] - whatever your query language is, some shapes of data will be super painful for it to deal with. Painful, but still possible.

[0] - https://en.wikipedia.org/wiki/Turing_tarpit

IgorPartola

0 replies

1d3h

2024-08-18 15:01:37 UTC

From what I understand SQL is or at least can be made Turing complete so in that sense you should be able to query any data store using it. However, that doesn’t mean it will be efficient to do so.

I suspect for most data structures you could construct an index to make querying faster. But think about querying something like a linked list: it is not going to be too efficient without an index but you should still be able to write an engine that will do so.

If you have something like a collection of arbitrary JSON objects without a set structure you should still be able to express what you are trying to do with SQL because Turing completeness means it can examine the object structure as well as contents before deciding what to do with it. But your SQL would look more like procedural code than you might be used to.

7952

0 replies

8h33m

2024-08-19 09:57:58 UTC

Perhaps the limit is not around formats but around the type system. You may be able to dump data, but can you actually reliably use it for anything?

michaelcampbell

15 replies

1d4h

2024-08-18 14:28:15 UTC

Neat; seems about every quarter or so one of these types of tools is highlighted here.

Awaiting all the responses from people to show off or list what tool they've landed on to support their specific use cases; I always learn a lot from these.

digdugdirk

13 replies

1d3h

2024-08-18 14:39:50 UTC

I'm a bit confused as to the use case. Is it just a way to interact with json/yaml style documents as if they were a structured database, but from the command line? Kind of an in-between for those moments you don't want to write a quick script to batch modify files?

It looks really well done, I think I'm just failing to see how this is more beneficial than just opening a single file in the editor and making changes, or writing a quick functional script so you have the history of the changes that were made to a batch of files.

If someone could explain how I could (and why I should) add a new tool to my digital toolbelt, I'd greatly appreciate it.

tofflos

2 replies

23h59m

2024-08-18 18:31:40 UTC

I used yq last week to scan through all the Java projects (i.e. Maven pom.xml-files) within our org to check which ones inherit from the corporate pom.

  yq eval --input-format xml --output-format csv '[file_index, file_name, .project.parent.groupId, .project.parent.artifactId, .project.parent.version]' **/pom.xml

hnlmorg

1 replies

23h1m

2024-08-18 19:29:45 UTC

Which yq? Last time I checked, there seemed to be a few tools with the same name.

srott

0 replies

11h21m

2024-08-19 07:10:07 UTC

The “go” yq

https://github.com/mikefarah/yq

| yq is a portable command-line YAML, JSON, XML, CSV, TOML and properties processor

Lord_Zero

2 replies

1d3h

2024-08-18 14:53:28 UTC

This could be useful for CICD where you need to bump a version number in a file based on the build number.

kate_bits

1 replies

2024-08-18 18:09:27 UTC

For this specific use case? sed would work just as well and probably already exists in your environment.

rout39574

0 replies

2h33m

2024-08-19 15:57:37 UTC

Having done similar things to this in the past, I rebut: no it wouldn't. :)

It's not trivial to change a particular

version: 2.33

line in a JSON file, when it's possible that string is present in many places with different meanings.

You can just wing it and be right 90% of the time; but that thoughtlessness will bite you in time.

Multiply that by the number of different context you might want to make "the same" change in a project, and sed/awk get to be a poor fit. YAML and JSON are just plain not line oriented.

You really need something with a featureset like xpath to set the semantically correct node to the right value, and every few years the kids decide they need YET ANOTHER thing that's not XML, or Yaml, or JSON, or TOML, or..

supriyo-biswas

0 replies

1d3h

2024-08-18 14:49:07 UTC

For things that are mostly shell scripts and things in a similar family (Ansible playbooks, deployment pipelines etc.) and where you need to modify a structured file quickly, it's usually much faster to use the DSL provided by the tool than calling out to various scripts to extract or modify a single JSON key.

People often say that they'd prefer to write their shell scripts in Python or even Go these days, but the problem there is that the elements of structured programming makes the overall steps difficult to follow. Typically, the paradigm with use cases adjacent with shell scripts is to be able to view what it is doing without any sort of abstractions.

simonw

0 replies

1d2h

2024-08-18 15:56:34 UTC

I use jq for this kind of thing several times a week. It’s great for piped data - things like running curl to fetch JSON, then piping it though to reformat it in different ways.

Here’s a jq expression I used recently to turn a complete GitHub Issues thread into a single Markdown document:

    curl -s "https://api.github.com/repos/simonw/shot-scraper/issues/1/comments" \
      | jq -r '.[] | "## Comment by \(.user.login) on \(.created_at)\n\n\(.body)\n"'

I use this pattern a lot. Data often comes in slightly the wrong shape - being able to fix that with a one-liner terminal command is really useful.

paulddraper

0 replies

2024-08-18 18:27:56 UTC

or writing a quick functional script

It's exactly a quick functional script.

macNchz

0 replies

1d3h

2024-08-18 14:55:29 UTC

I see the appeal of having a declarative syntax rather than writing a bunch of code to make the change reliably and safely.

fsckboy

0 replies

1d2h

2024-08-18 15:49:27 UTC

the in-between mode that you mention but seem to dismiss it is the way most traditional unixheads work with data most of the time: from the command line

editor? when i pull up emacs, 50% of the time it's write emacs macros, and I do that because shell scripts don't easily go backward in the stream. (something rarely mentioned about teco was that it was a stream editor that would chew its way forward through files; you didn't need the memory to keep it all in core, and it could go backward within understandable limits)

writing an actual shellscript is only for when it's really hairy, you are going to be repeating it and/or you need the types of error handling that cloud up the clarity of the commandline

the commandline does provide rudimentary "records" in the saved history

Natfan

0 replies

19h16m

2024-08-18 23:14:47 UTC

PowerShell[0]'s built-in Microsoft.PowerShell.Utility[1] module has commands to `Convert-From...` or `Convert-To...` JSON, CSV, a version of XML (CliXML), or custom key/value pairs (StringData) into objects, which can then be manipulated. Combined with IO cmdlets from the built-in module Microsoft.PowerShell.Management[2] such as `Get-Content` and `Set-Content`, a fair chunk of flat-file storage should be able to be made mutable with PowerShell.

---

[0]: https://microsoft.com/powershell

[1]: https://learn.microsoft.com/powershell/module/microsoft.powe...

[2]: https://learn.microsoft.com/powershell/module/microsoft.powe...

0thgen

0 replies

1d3h

2024-08-18 14:45:24 UTC

one benefit (idk if it applies here) is if the select/put/delete statements didn’t require loading the data in memory; so you could query massive data files with limited RAM and not have to solve that problem yourself for each data storage format you’re working with

hnlmorg

0 replies

23h3m

2024-08-18 19:28:04 UTC

Personally I think this is a problem better spent by fixing the shell. There’s a few alt shells out there now, Nushell, Elvish plus the one I help maintain, Murex (https://murex.rocks).

I’m obviously going to biased here, but it’s definitely worth your time checking out some alt shells.

montroser

4 replies

1d3h

2024-08-18 14:53:33 UTC

Cool project -- but we need a standardized/spec'd query language in order to realize the goals in the "one tool to rule them all" section of this readme.

I have a hard time internalizing the jq query syntax, and am not overly excited to invest in learning all the quirks when it's not based on a widely-adopted open standard. Maybe `JMESPath` could be the way forward.

Sometimes `gron` can be a pretty great alternative approach, depending on your use case. At least it is very intuitive and plays nicely with other tools.

AtlasBarfed

2 replies

1d3h

2024-08-18 15:00:57 UTC

Ultimately JSON, TOML, YAML, XML, properties files are tree structures, and XPath type syntax should roughly apply to them all, along with about a hundreds "path expression" languages (java had SpEL, velocity, JSP-EL, OGNL, and probably dozens of others).

XPath, although it had some clunky artifacts for XML (which was the reason we moved from XML like namespaces... ugh), had basically the apex of expression/path/navigation capabilites. It would be really nice to see XPath ported to a general nav language that is supported by all programming environments and handled all the relevant formats.

sitkack

0 replies

20h58m

2024-08-18 21:32:40 UTC

Speaking of trees, gron/ungron is an amazing transformer that allows one to use any query tool on the leaves of the tree and then turn the flattened structure back into a document (json).

I'd love to see gron/ungron implemented for all tree structures.

https://github.com/tomnomnom/gron

dleeftink

0 replies

1d2h

2024-08-18 16:18:59 UTC

I still like Xidel[0] for this reason; it may be a little older, but for a CLI scraper a lot of data transformations needs can be satisfied with Xpath/XQuery.

[0]: https://github.com/benibela/xidel

paulddraper

0 replies

2024-08-18 18:29:53 UTC

Jq is far more useful/capable than JMESPath.

bloopernova

4 replies

1d3h

2024-08-18 14:47:58 UTC

Having recently messed with JMESPath in AWS, I wonder which of these structured data tools:

  - Is easier to learn
  - Has most/best documentation 
  - Is faster to write in

Does anyone know of a good comparison article?

(I still default to jq, I guess it has the momentum)

wodenokoto

2 replies

1d3h

2024-08-18 14:54:27 UTC

Is there a JMESPath tool that works on json, yaml, toml, etc?

taeric

0 replies

3h28m

2024-08-19 15:03:13 UTC

JMESPath works on anything you can parse into a dictionary with python, right?

It is missing a good "search" ability, though. If you don't know the full path down to the data, good luck.

bloopernova

0 replies

1d3h

2024-08-18 15:12:31 UTC

I don't think so, that's a good point against it.

It's heavily used by AWS and Azure though.

elviejo

0 replies

1h7m

2024-08-19 17:23:35 UTC

Try josnata.org

I have tried jq, jmespath, yq and Jsonata was the best for me.

FireInsight

3 replies

1d2h

2024-08-18 15:45:04 UTC

I like using Nushell for this. It has a `from` builtin for all sorts of formats https://www.nushell.sh/commands/categories/formats.html and after that the data is just tables, which you can query with other builtins and syntax https://www.nushell.sh/book/navigating_structured_data.html

RulerOf

2 replies

20h53m

2024-08-18 21:37:22 UTC

I got really excited when I saw `from_ini` but disappointed when I saw no `to_ini`.

I would really like to find a good workflow for idempotent modifications to INI files, but haven't stumbled across one yet.

saint_yossarian

1 replies

10h13m

2024-08-19 08:17:29 UTC

I'm abusing `git config -f $PATH` as a poor man's portable INI parser/modifier in shell scripts, works pretty well but imposes some Git-specific syntax limitations.

RulerOf

0 replies

3h59m

2024-08-19 14:32:15 UTC

Funny enough, the thing that brought this up for me most recently was that some value in `~/.gitconfig` is not idempotent when set through the command line, and I was trying to use it in a shell startup script.

taeric

1 replies

3h30m

2024-08-19 15:00:27 UTC

Cool tool that I will have to try and fit into my belt. Probably my loudest "old man" gripe on dealing with data, is that XPath was actually quite nice. I always reach for whatever equivalent I can get for dealing with data in any project I'm working on.

elviejo

0 replies

1h10m

2024-08-19 17:20:51 UTC

I agree when the datastructure is a Tree there is nothing better than xpath.

That's why wheb working with Json I Love Jsonata.org

And it's not surprise is that good the creator of jsonata is on the xpath and xquery commites.

samstave

1 replies

23h5m

2024-08-18 19:25:43 UTC

This was really cool to read:

Please open a discussion if:

    You have a question.
    You're not sure how to achieve something with dasel.
    You have an idea but don't quite know how you would like it to work.
    You have achieved something cool with dasel and want to show it off.
    Anything else!

---

I really like dasal.

Can I pipe a .csv to dasal and have it spit it out in JSON? And is that the best way to do that? (arent there like a ton of ways to achieve this, or would dasal make it super simple?)

Also, what would be interesting would be to be able to pull and scrape text, to put into a structured JSON.

For example - I was talking about using a Discrenment Lattice to construct a profile for a PERON PLACE THING that one was doing research on, such that you can pull multiple sources/data-types for information on [SUBJECT] and have the knowledge dossier updated. Where, for example one could pull a lot of results that can be summarized by an GPT - then using Dasal to grab the relevant component-data-points and dasal-ize and feed them into the Discernment Lattice JSON File such as I described here:

https://i.imgur.com/vuuAtAL.png

So building out a structured lattice file for a senator would look like:

https://i.imgur.com/68WFiGA.png

So, using a crawlee txtai workflow --> dasal parse --> into lattice file.

Then the lattice file can be used to compare similar slices across all the different [SUBJECTS] -- such that further ties can be made.

So, in this example - we have the data being organized for all the various entanglements a congress person has - and we can use that as a constraint for searching for relations between [subjects] which share elements across ordinarily opaque threads.

The cool thing, is that one could then easily use it to ensure you scrub and manipulate the data into a more trainable lens for effectively fine tuning the data that you want to fine tune the model with/on - thus creating a hyper contextually focused lens - https://i.imgur.com/yngUwpr.png

coreytrevor

0 replies

18h46m

2024-08-18 23:44:59 UTC

Hope that preliminary benchmarks listed in the README.md could be implemented in a proof-of-concept scheme rewarding functions returning the correct variable within a blockchain.

https://github.com/sharkdp/hyperfine

ranger_danger

1 replies

23h28m

2024-08-18 19:03:16 UTC

Why are terminal "movies" always a gif with no video controls that move at the speed of light?

hnlmorg

0 replies

22h56m

2024-08-18 19:34:59 UTC

They aren’t always. There’s sites like asciinema (I hope I’ve spelt that right). But the problem is GitHub readme’s are pretty limited in what you can embed. So you either have to link out to another site, or embed an animated gif.

mbrumlow

1 replies

2024-08-18 18:12:11 UTC

I can’t tell you how many times I’d have hobbled together a tool like this to use in go. I will be converting to this.

Sometimes we don’t actually want to parse yaml, we just want to mutate it without needing to module the underlying objects.

Being able to select and replace, add data to an existing yaml document is a huge win for automation.

bbkane

0 replies

20h14m

2024-08-18 22:16:31 UTC

Yes! This is especially powerful when combined with git-xargs to auto-open PRs with the results of the mutation.

I wrote about this a little in https://www.bbkane.com/blog/go-project-notes/#scripting-chan... and it's really helped me keepy GitHub workflows and various config files in sync across project repos

levzettelin

1 replies

1d3h

2024-08-18 15:02:12 UTC

How often do you have to add singular entries to some data file your working with? For all other cases, Miller and xsv look more powerful.

notRobot

0 replies

1d2h

2024-08-18 15:55:47 UTC

Often you're adding multiple lines programmatically from a script or cron or whatever.

gergely

1 replies

20h30m

2024-08-18 22:00:35 UTC

I have 6 petabytes of parquet where I would need to replace 1 value in each line. Could it handle that?

eatonphil

0 replies

1h50m

2024-08-19 16:41:07 UTC

I also have a usecase for tweaking bits of a parquet file.

arandomhuman

1 replies

23h53m

2024-08-18 18:37:25 UTC

Shameless plug but if you’re a fan of jq style querying rather than sql for some reason you can also use qq[0] for these and a few other formats.

[0] https://github.com/JFryy/qq

whalesalad

0 replies

4h11m

2024-08-19 14:19:52 UTC

jq is one of those tools that I require an LLM to use

pixelbeat__

0 replies

21h32m

2024-08-18 20:58:21 UTC

A similar tool for ini files https://www.pixelbeat.org/programs/crudini/

nikolay

0 replies

2h39m

2024-08-19 15:51:45 UTC

No HCL support [0] though.

[0]: https://github.com/TomWright/dasel/issues/98

mrbluecoat

0 replies

4h1m

2024-08-19 14:29:24 UTC

Nice tool! TSV would be a nice add for logs.

guilherme-puida

0 replies

6h15m

2024-08-19 12:15:44 UTC

I maintain dasel on Debian [1], and It's such a nice tool! I don't use most of its advanced capabilities, but being able to convert from/to any of those formats with a single tool is very useful.

[1]: https://tracker.debian.org/pkg/dasel

frou_dh

0 replies

1d1h

2024-08-18 17:08:07 UTC

Another one for the big list:

https://github.com/dbohdan/structured-text-tools

In fact it's already on it 6 times.

ensocode

0 replies

11h31m

2024-08-19 06:59:36 UTC

Thats very neat. Thanks. Would love to use that instead of dealing with XPath and JsonPath. But are there any client libraries for different languages, besides Go? I think having some Python, Java, C# client libs would be helpful to use it more commonly as if it were it's own standard

afiodorov

0 replies

9h28m

2024-08-19 09:02:41 UTC

I occasionally need to remove PII from json payloads before sharing it on slack, would be nice if there was some kind of obfuscate option for either all json values or just some that’d replace all values with **.

afh1

0 replies

20h34m

2024-08-18 21:56:51 UTC

Interesting, but I guess yq already does that, albeit slower according to the README.

Gepsens

0 replies

1d1h

2024-08-18 16:31:35 UTC

I think you need some kind of autocomplete here to make it worthwhile