To me YAML seems like the CoffeeScript of JSON, and unlike CoffeeScript I don’t understand why people are still using it.
I guess XML and JSON are too verbose. But YAML is so far in the opposite direction, we get the same surprise conversions we’ve had in Excel (https://ruudvanasseldonk.com/2023/01/11/the-yaml-document-fr...). Why is “on” a boolean literal (of course so are “true”, “false”, as well as “yes”, “no”, “y”, “n”, “off”, and all capitalized and uppercase variants)? And people are actually using this in production software?
Then when you add templating it’s no longer readable and concise anyways. So, why? In JSON, you can add templating super easily by turning it into regular JavaScript: use global variables, functions and the like. I don’t understand how anyone could prefer YAML with an ugly templating DSL over that.
And if you really care about conciseness, there’s TOML. Are there any advantages of YAML over TOML?
Dunno, to me YAML is the python of markup languages.
YAML is decent at handling things like nesting and arrays, while TOML sucks at it.
I don't dislike YAML that much.
That being said, we knew since the dawn of C macros that templating languages which are not aware of syntax, are AWFUL.
Likewise, writing Helm charts (the place I encountered YAML templating) is just horrible, but would be so much nicer is templates respected the YAML syntax tree and expanded at the right subnode, instead of being a text replace botch-jobs.
The biggest issue I have with Yaml is that they forbid tabs.
Their argument is that tabs are shown differently in every editor which is actually something I like. When you're looking for something deeply nested you can reduce the tab distance a bit, when that's not needed you can increase it to improve visibility of nesting levels.
And forbidding it makes a one-keystroke action a two or four one.
I really don't understand the python/Yaml hate for tabs, and as a result I don't really use either.
You can’t be serious
Not everyone wants a bloated and buggy IDE to write their code for them.
Like vim, for example? Which supports replacing tab inputs with spaces...
I code practically exclusively with vim. The replacement is buggy and has many corner cases that come up constantly. As in all editors.
Tab indentation has no bugs or corner cases.
And I've been using vim exclusively for north of fifteen years with Tab replacement, never had a problem with the editor getting confused about what happens with spaces when I hit Tab.
Some detail about the corner cases you've run into would be great, if they're happening constantly I can see how it would be a bugbear.
For example with vim (debian) defaults, if you happen to have a 2-space indented Python (the first two spaces are for HN formatting, the first if should start at zero indent):
And continue to add another if block in that, the autoindent will give you four spaces: And if you make a new line after the last row there and hit a backspace, it'll erase one space instead of four, giving an indentation of 3 (+2) spaces. And if you start a new line after that, you'll get an indentation of 8 spaces in total. Ending up with: This is just a one case, but things like this tend to happen quite often when editing code. Even if it's been originally PEP-8 indented. Usually it's not what the Tab does, but what the Backspace or Autoindent does. I'm not exactly sure what exact Tab/Backspace/Autoindent rules underlie the behavior, but I can imagine there having to be quite a bit of hackery to support soft-tabs.For me this kind of Tab/Autoindent/Backspace confusion is frequent enough that I'd be very surprised if others don't find themselves having to manually fix the number of spaces every now and then. And when watching over the shoulder I see others too occasionally having to micromanage space-indents (or accidentally ending up with three space indented blocks etc), also with other editors than vim.
As with most things in vim, it is definitely manageable in settings such as tw=2 (tab width) and sts=2 (soft tab stop). This is why a lot of older Python files, in particular, are littered with vim modelines with settings like these.
The nice modern twist is .editorconfig files and the plugins that support them including for vim. You can use those to set such standard language-specific config concerns in a general way for an entire "workspace" for every editor that supports or has a plugin that supports .editorconfig.
Of course you can override it, but is there any excuse for that default behavior? It sounds ridiculous.
The defaults are either 4-space or 8-space soft tab stops. 8 spaces it the oldest soft tab behavior. 4-space soft tabs have been common for C code among other languages for nearly as many decades. It is only relatively recently that Python and JS and several Lisp-family derivatives have made 2-space tab stops much more common of a style choice. Unfortunately there is no "perfect" default as these are as aesthetic preferences as anything else.
(It is one of the arguments for using hard tabs instead of soft ones in the eternal tabs versus spaces debates because editors can show hard tabs as different space equivalents as a user "style choice" without affecting the underlying text format.)
Soft tabs at 4 would be fine, though worse than autodetect. But that is not the behavior described in the above post.
The behavior described above seems to me to be exactly soft tabs at 4 in a 2-space tab document with autoindent turned on (often the default).
Vim has no autodetect by default. (I'm sure there's a plugin somewhere.)
The part where the user is on a line indented by 2, hits return, and gets a line indented by 2+4=6 doesn't sound like soft tabs at 4 to me. And I wouldn't expect hitting backspace to then only remove 1 space (if it actually removed 2 that makes more sense, but is inconsistent with what what it just added). At that point, hitting return and getting a line indented by 8 might make sense but is weird.
Another comment suggests it's using 2 and 4 for different settings and that's causing problems.
Well, yes. But that's one more small thing to config and manage. Not a big deal in isolation but such small things add up to significant yank.
With Tabs we wouldn't have this yet another papercut to tool over.
This is because ftplugin/python.vim does:
So if you use "set sw=2" then it leaves tabstop and softtabstop at 4.You can set that g:python_recommended_style to disable it.
Also sw=0 uses the tabstop value, and softtabstop=-1 uses the shiftwidth value.
I agree Vim's behaviour there is a bit annoying and confusing, but it doesn't really have anything to do with tabs vs. spaces. I strongly prefer tabs myself as well by the way.
Even when you DO use tabs Vim will use spaces if sw/ts/sts differ by the way. Try sw=2 and using >>, or sts=2 with noexpandtab.
When looking at the code, tab-containing files are the most inconsistent ones, especially when viewed via general tools (less, diff, even web viewers).
Sure, if people would only ever use tabs for indentation and spaces for alignment, things could be good. But this almost never happens, instead:
... some lines start with spaces, some with tabs. This looks fine in someone's IDE but the moment you use "diff" or "grep" which adds a prefix, things break and lines become jagged.
... one contributor uses tabs mid-line while other use spaces. It may look fine in their editor with 6 character tabs, but all the tables are misaligned when looking in app with different tab size.
Given how many corner cases tabs has, I always try to avoid them. Spaces have no corner cases whatsoever and always look nice, no matter what you use to look at the code.
(the only exceptions are formatters which enforce size-8 tabs consistently everywhere. But I have not seen those outside of golang)
People using tabs for alignment can happen when you've got a tab-camp-person who hasn't yet realized how they're terrible for alignment.
But "some lines start with spaces, some with tabs" happens for precisely two reasons:
* you have a codebase with contributors from both camps
* people thought in-editor tooling was the solution (now you have two problems)
This is tooling and (as you realized) stop preference dependent.
Almost every text editor has support for tabs-as-spaces.
I haven't used an IDE in years.
I don't want that though. Because then when editing I still have to mess around with spaces.
And the double nature of the spaces makes it hard to see when you have an odd number of spaces when you reach deep indenting levels, which counts as the lesser number of double spaces in Python.
IMO it would be ideal if tabs would be displayed as a block, and you could resize the width of that block on the fly <3
Is there any editor in 2024 that can't replace tabs with spaces? Is it just Notepad?
I prefer to keep my json to one line without white spaces, saves on disk space.
I prefer to write to my disk manually with a magnetized needle in a clean room.
Clean room's for tryhards. Dust adds flavor.
I recommend you use smaller fonts as well.
Ouch. The only problem with the obvious sarcastic tone of that comment is that there are plenty of people that do say exactly the same thing and mean it.
JSON formatting is less important because most apps that deal with it come with good “beautify”, “sort”, “remove all formatting white space” functions in the editor
For code I'd agree. However for configuration files, I find that I often need to edit them in places or environments where I don't have anything but the most bare-bones editor.
A quick search shows that even nano can be configured to use whatever number of spaces you want when you hit tab:
https://askubuntu.com/questions/40732/how-do-i-get-spaces-in...
I consider Nano a fully fledged editor. I'm talking Notepad or a html text box.
Oh. Windows with no ability to install anything didn't even occur to me! I'm truly sorry.
When this happens, I copy four spaces and then use Ctrl+V for Tab.
Yes, it’s not exactly the same due to alignment, and yes you have to repeat it after using the clipboard for other purposes, but it’s good enough for that occasional use.
Tabs aren't a problem
Spaces aren't a problem.
What is a problem is not picking one or the other. There's arguments for both sides but it is critical to just take a side. I'm sorry your side lost but it makes everything better to just go along with the consensus.
At one of my internships in the 90's, a developer I worked with solved the problem by never indenting. Every single line of code started at column 1.
Why even use more than one line?
Why not just make everything whitespace? Give both tabs and spaces their rightful place! https://en.wikipedia.org/wiki/Whitespace_%28programming_lang...
True, that's how real coders work!
10 IF A=1 OR Z=2 GOTO 30
20 GOTO 50
30 PRINT "HELLO WORLD"
40 GOTO 10
50 GOTO 30
Sounds like you were an indent-ured labourer.
Why did you leave an empty column at the start? :]
It sounds like this is the origin story of Python. Whoever worked with this person made it their life’s mission to enforce proper indentation.
I thought the consensus was tabs for block indentation, spaces for alignment.
No, that's what the tabs hold-outs have morphed into. Which illustrates the problem with tabs: It's very difficult to get everyone on a team to care about tabs or not care about alignment.
No, significant whitespace is the problem.
So - you're saying that mixing tabs and spaces in the same file is entirely unproblematic outside of languages with significant whitespace?
Are you sure about that?
i don't take sides, I use tabs AND spaces
"Tabs for indentation, spaces for alignment" is something I wish had caught on.
https://lists.gnu.org/archive/html/emacs-devel/2016-12/msg01...
This e-mail enters to my "favorite quotes from internet" list directly from the top.
It is a funny quip, but I wish they'd consider the reformatting. I find using an autoformatter reduces cognitive load while reading and writing.
Yeah, OP is not wrong. I also like neatly formatted code and is way easier to read.
I always reformat all my code before all commits. It's just good hygiene.
The funny part is the fussing and the answer they get.
I'd just autoformat the area of my patch and send in the patch that way, maybe plus some autoformatted blocks here and there, slowly fixing the stuff as I go.
If something is too bothersome, first try doing something, and figure out the rest of the process as you go.
Edit: blocks became blogs without my knowledge. Maybe I should write a blog post about it. Don't know.
Us old folks remember the days when reformatting was a computationally expensive action that required a special program to “pretty print” the code. And heaven forbid your code used some language feature your pretty printer didn’t understand and mangled the output making your code uncompilable.
Well, I'm not that of a young folk. I was playing with computers (programming, in fact) in the early 90s, and I remember when it was expensive.
However, Eclipse is formatting C++ code with a simple hotkey and without breaking it and understanding the language for the last 15 years as far as I can remember. It's instant, too.
Because of that I feel a bit surprised when younger people look it like it's black magic. It's neither new, nor unsolved in my conscious experience.
Reformat-on-change is also a valid strategy!
I think I've even seen this employed on C++ codebases with clang-format. Conceptually, it's like `git diff | clang-format`, but there are more flags and scripts involved: https://clang.llvm.org/docs/ClangFormat.html#script-for-patc...
There's deep wisdom there.
The majority of editors can be configured to use tab to insert the appropriate number of spaces. Many will automatically detect the correct configuration.
The majority isn't all, and in my experience you always end up having to use one in some random situation that doesn't have that. tap tap tap tap
Literally 100% of editors support tabs.
The horror
The attempt on my life has left me scarred and deformed.
Which editor have you run into that doesn't? Even nano supports configuring it with both nanorc or a command line flag.
HTML textareas don’t support entering tabs.
There are rich-text editors that increase the margin on Tab rather than inserting a tab.
> forbidding it makes a one-keystroke action a two or four one.
Not if your editor can be configured to interpret a Tab keypress as the appropriate number of spaces. AFAIK all common text editors, at least in the Unix world, do this.
I don't want that though, because then I still have to mess around with spaces when editing.
I actually like tabs for indenting levels especially because I can configure how far they indent on the fly.
My editor can also change the width of blocks of spaces on the fly, as well as navigating them in arbitray chunks.
I'm pretty sure most of the "spaces" people have their editor set up to convert the 'tab' key into multiple spaces.
Now excuse me while I duck under this table.
You're safe because you're right.
Every editor I know can enter n spaces when pressing tab. That might solve your concern.
It does not. You still have to mess around with a bunch of spaces when you're editing or copy/pasting, and not having exact even numbers makes for ambiguous situations.
I agree with you about YAML's treatment of tabs. I still use YAML because there's often no other choice.
Python is actually flexible in its acceptance of both spaces and tabs for indentation.
Maybe you were thinking of Nim or Zig? Nim apparently supports an unsightly "magic" line for this (`#? replace(sub = "\t", by = " ")`), and Zig now appears to tolerate tabs as long as you don't use `zig fmt`. I haven't used either yet because of the prejudice against tabs, but Zig is starting to look more palatable.
True, I'm using it too when I have no other choice.
True but it does give constant warnings then which is annoying. And I was worried about it dropping support in the future so I didn't want to waste time learning it.
Your problem, and I mean this sincerely and respectfully, is that you're not using your text editor / IDE correctly. Adding two or four spaces of indentation is done by pressing TAB! Once. Most editors will do know how to do this out of the box, but if yours doesn't you need to change it.
You still have to mess around with a bunch of spaces when you're editing or copy/pasting, and not having exact even numbers makes for ambiguous situations.
Especially if something is 5 levels deep, it's really hard to see if you have 12 or 11 spaces (so 5 levels + 1 space or 6 levels) indentation.
Like with Python, any competent text editor will take care of this for you, I've never encountered this issue before.
This isn't a tabs or spaces issue. This is a "your editor is bad or configured wrong" issue.
The worst thing with Helm charts is not the YAML, or even the text replace botch-jobs, but that they seem to think that a Go stacktrace is reasonable error reporting. I don't think I've ever worked with a tool with such awfully useless error messages.
But I agree, it'd be better if the template expansion was actually structural and not just text. The huge amount of "| indent 8" etc. in Helm charts is such a stench that by about the second time people encountered that they ought to have made a better template expansion mechanism top priority.
You have an error on line one. Good luck
Ah, this had me laughing.
Unlikely it will ever get better. First to market with a prototype tool, gains market share and momentum. Eventually the enthusiasm fades off and people start hating it, for good and sometimes bad reasons. Yet users are stuck because change is expensive and risky. The team is stuck because any change risks becoming the straw that broke the camel's back, possibly cascading through the user population. Story of our young industry.
And then the second layer of hell, a DSL inside the YAML for something like an Azure DevOps pipeline. Truly awful.
My personal favorite was when my company switched to configuring Jenkins in YAML, with some of the config being in YAML proper and other config being in Groovy embedded inside of multiline strings. Since it's Jenkins, the Groovy itself embeds multiline strings for scripts that need to run, so the languages end up nested three levels deep!
The only thing that saves me is IntelliJ's inject-language-in-string feature.
TOML has the inline table syntax with curlies, like JSON, and inline array syntax with brackets, also like JSON. It could support nesting pretty well.
Sadly, it doesn't support line breaks in the inline table syntax, so using inline tables for nesting is a PITA; inline tables are pretty much unusable for anything which doesn't fit within like 80-100 characters. Inline arrays can contain newlines however, so deeply nested arrays works well.
Newlines in inline tables will be coming in TOML 1.1, which will make TOML much better for deeply nested structures. Unfortunately, there will probably be many years until 1.1 is both actually released and well supported across the ecosystem.
And of course, inline tables can't be at the top level of the document, so TOML might still not be the best way to represent a single deelpy nested structure.
Yeah, that's why I prefer ytt over helm syntax. It isn't great syntax, but at least it is aware of what it is doing.
Having said that, yaml has some pretty obvious mistakes. It should have been a lot more prescriptive about data types. Not doing that creates a lot of unneeded confusion and weird bugs.
People balk at XML, but its verbosity plus DTD allows it to pull tricks which you can't do on other things.
Well everything has its place, but XML is I think very well suited where you need to serialize complex things to a readable file, and verify it while being it's written and read back.
Indeed. I get a lot of value out of my strongly typed XML documents. I generally have code that validates them during writing and after reading. Those who don’t understand XML end up learning why it is verbose when they eventually add all of the features they need to whatever half-baked format they are using.
An XML document without a schema is strictly worse than JSON without a schema. JSON with a schema is strictly better than XML with a schema. XML structure does not map neatly into the data types you actually want to use. You do not want to use a tree of things with string attributes, all over your code. If you do have a schema, the first thing you will want to do is turn your data into native language data types. After that point, the serialization method does not matter anymore, and XML would have just be slower. Designing a schema for XML is also more tedious than for JSON.
I enjoy JSON for internal stuff and where it does not matter that JSON is not very expressive. JSON Schema is a poor substitute for a proper schema. For anything where I am interfacing with another person or team, I send them a DTD or XSD, which documents the attributes and does not have nonsense like confusing integers and floating point values.
For quick and dirty, I agree about JSON. For serious data interchange, I use XML.
I am baffled by this assertion. XML Schema (XSD) is much more expressive than JSON Schema.
Considering I have mapped 3D objects to (a lot of) C++ objects containing thousands of facets under 12ms incl. parsing, sanity checking, object creation, initialization and cross linking of said objects on last decade's hardware, I disagree with that sentiment.
Regarding your first point, even without a schema, an XML shows its structure and what it expects. So JSON feels its hacked together when compared to XML in terms of structure and expressiveness.
It's fine for serializing dark data where people won't see, but if eyes need to inspect it XML is way way more expressive by nature.
Heck, you even need to hack JSON for comments. C'mon :)
The 'XML is verbose' argument is exactly analogous to the 'static typing is verbose' argument. JSON is decent, but it quickly breaks down if you want to have any sort of static sanitisation on input data, and the weird `"$schema"` attribute is quite strange. YAML makes no sense whatsoever to me.
XML is by far the most bulletproof human-readable serialisation-deserialisation language there is.
It’s two things: the static typing analog is definitely there but I’d extend the comparison to something like the J2EE framework fetish & user-hostile tools, too. There were so many cases where understanding an XML document required understanding a dozen semi-documented “standards” and since few of the tools actually had competent implementations you were often forced to write long-form namespace references in things like selectors or repeat the same code.
I worked with multiple people who were pretty gung ho about static typing everything but the constant friction of that self-inflicted toil wore over time. I sometimes wonder whether something more in the Rust spirit where the tools are smart enough not to waste your time might be more successful.
I agree. Here in 2024, I hope everyone agrees that types are great.
Static types, aren't just verbose, they're clunky. They only work in a perfect world - dynamic types provide the functionality to actually thrive.
That could help, the problem being XML. You mention the J2EE framework and semi-documented "standards" - the world is rife with bad xml implementations, buggy xml implementations, and bad programmers reading 1 GB xml documents into memory (or programs needing to be re-worked to support a SAX parser).
There's too much baggage at the feet of XML, and the tools that maybe could have helped were always difficult to use/locked behind (absurdly expensive) proprietary paywalls.
JSON started to achieve popularity because as a format, it was relatively un-encumbered. Its biggest tie was to Javascript - if certain tools hadn't been brain-dead about rejecting JSON that wasn't strictly just JSON, it might have achieved same level of type safety as schema-validated XML, without much of the cruft. But that's not what the tools did, and so JSON became a (sort-of) human-readable data-interchange format, with no validation.
So in 2024 we have no good data-x-change formats, just random tools in little niches that make life better in your chosen poison format. We await a rust - a good format with speed, reliability, interoperability, extensibility, and easy-to-use tools/libraries built in.
Agreed. XML is clunky, no doubt, but it's partly that the tools were just clunky.
Having said that, I do like that you can flip between YAML and JSON. If we could do that with XML (attributes vs sub-elements a problem here) it would be much more useful I think.
I think PDML hits a sweet spot. The author didn't set out to recreate XML in a less verbose, more human readable syntax, but pretty much ended up doing so. I'd like to see it mature and gain more widespread adoption.
XML + DTD + XMLSchema had things we're still figuring out to do with YAML ja JSON
You could easily generate an UI based on just the DTD and Schema that could be used to fill a perfectly valid XML file.
Validating incoming XML was a breeze, just give it to the validator class along with the DTD and Schema and boom, done.
See the boom? It's boomer tech. We can't have old, boomer tech in 2024.
Jokes aside, I wish people spent the time to understand the technologies before disliking them and blindly implementing a different, inferior one.
XML is more popular today than it's ever been. It's just called JSX now.
Besides being aesthetically similar to SGML, because it maps to HTML, JSX has nothing to do with XML. It is Javascript.
It's literally shorthand for "Javascript XML" and its templating syntax is the same as XML. It has a lot to do with XML.
It just looks like JavaScript version of JSP to be honest.
All of that is doable with JSON Schema, though, noy so sonething that we’re still figuring out how to so.
absolutely, 100%.
When I first encountered XSLT I seriously thought it was the most ridiculous thing I had ever seen. A frickin' programming language whose syntax was XML.
But then I learned it and I don't think I've ever seen another language that could do what XSLT could do in such a small amount of code. The trick was to treat it like a functional language (I got this advice from someone else and they were absolutely correct). Where most people got into trouble was thinking of it as an imperative language.
Pattern matching expressions is the kool kid on the block, but XSLT had that to the nth degree 20 years ago.
Indeed, XML is a decent document language because of the quality of tools available and its power/flexibility. I hate when people use it for config files and other things that are usually human edited where readability is paramount though.
It's a good comaparator, there are indeed a lot of similarities, but I never understood why anyone ever used Coffeescript whereas I do think I have a solid understanding of why people use YAML.
It's more like Python than Coffeescript really: it's not just about simplicity & brevity, it's about terminators.
Whitespace-dependent languages are often a pain to format / parse / read in many ways - Python has survived this by the skin of its teeth by being extremely strict about indentation, both in terms of the parser & also community convention. YAML hasn't had this - it remains a mess.
However, both have that very attractive property of not requiring terminators, which can't really be understated.
TOML's got some good properties but its handling of structures with a depth > 1 is far from concise, and pretty terrible if I'm honest.
When Coffeescript was invented, it was an advancement on top of the awful Javascript standards at the time. It never went anywhere because Javascript caught up, but Coffeescript had a good reason for existing.
Today, Coffeescript is a remnant of old frontends that nobody has bothered transpiling into Javascript yet, but back in the day it was a promising new development.
That was certainly the selling point. I never saw any advancements in it - the features were aesthetic syntactic sugar.
Coffeescript came with spreads and destructuring, and added string interpolation, just to name a few things. It also added classes and inheritance, the ?. operator, .
I suppose you could argue those are just synctatic sugar because they compiled down to ES5, in the same way you can argue that any programming language is synctatic sugar over raw machine code.
I may disagree (_heavily_) with the Pythonesque syntax Coffeescript chose, but it took a while for ES6 to be widely available, and Coffeescript made ES6 features work on most browsers without any additional effort. It's easy to take today's Javascript for granted, but the web was very different back in 2009.
In addition to this: ruby-like classes and "sane"/expected handling of this using fat arrow functions. I've worked with a few developers at the time that considered themselves pure backend/rails developers and didn't (bother to) grok the details around the way this worked in JS.
I distinctly remember lots of var that = this; in JS code back then, which wasn't required anymore when using CoffeScript.
Class sanity was the major reason I chose it for a project in the early 2010s. I was interacting with the classes in OpenLayers and being able to do so without all those footguns was very welcome.
javascript was never designed to be used like a classic OOP language, that's why jquery won, it was functional which meant it didn't fight you the way the other libraries did.
javascript is first and foremost functional no matter how hard MS and others have tried to hammer it into a more typical OOP language.
I'm not sure what you mean. You can put functions into objects, you have "this" when you call the functions, you even have prototypes. It seems to me like the language is designed to let you do OOP just fine, and the only thing that was awkward was organizing the code where you define all those functions and the constructor. So they added a sugar keyword for it.
right, it's awkward, so don't do that, be functional instead.
jquery vs mootools/scriptaculous/etc.
jquery won for a reason, it's just flat out a better experience in terms of code specifically because it uses a functional approach in its api rather than an OOP approach.
I would argue that fat arrow functions really are nothing more than synctactic sugar. I don't know of any place where (x,y) => {} couldn't be replaced by function(x,y){}. I prefer arrow functions myself, but it's a very minor additions.
Fixing _this_ is a good point, though.
When you didn't know how this worked, CoffeScript's fat arrow functions became a life saver when attaching callbacks from inside some object you were writing that probably had an init() method to set up the handlers:
vs. You only needed a .bind(this) in the plain JS version, but it felt like surprisingly few people knew this back then.Interestingly enough, the current version of CoffeeScript compiles this code into a ES6 arrow function itself, but I think back then they used bind() in the transpiled JS.
Fat arrow functions were adopted from coffee script.
IIRC it also had some different scoping rules so you didn’t need to sprinkle `bind` all over.
This is why I created StrictYAML. A lot of the pain of changing YAML goes away if you strictly type it with a schema but you keep the readability.
Counterintuitively that also includes most indentation errors - it's much easier to zero in on the problem if the error was "expecting status code or content on line 334, got response", for instance.
That makes a lot of sense, though I'd guess that a lot of yaml-ops types wouldn't want to have to write schemas.
TOML's sections remind me of the directory part of a filename and keys files.
For the content that belongs in a typical configuration file this or the INI style roots are probably the most human approachable formats. For anything more complex maybe a database (such as SQLite?) is preferable past application bootstrap?
Reading yaml has the enjoyment of reading a love letter where else json has the deterimental feeling of a solicitor email. For writing, yaml is like putting out the draft, you only focus on the meaning not care for else or the form, but for json it is like finishing up your thesis with hard defined structure.
CoffeeScript was the front runner for 'Compile to JavaScript' technology. It was the first time we could write some sane frontend code.
Of course things like TypeScript came along and now we cannot unsee what we have already seen.
Probably not but you forget YAML came out in 2001 where TOML came out in 2013. Neither are spring chickens but inertia is a hell of a thing. For example, Symfony supports YAML, XML and PHP definitions -- but not TOML. Symfony v2 simply predates TOML and they never got around to ditch YAML for TOML because it's not worth the bothering.
TOML is just an .ini file plus some syntactic and computing sugar. I can argue that TOML is actually way older than it is.
1. I am unaware of a standardized .ini format
2. The native types in TOML are useful.
Shall we bet on what would happen if we asked 10 random people of any IT stripe to write a small sample INI file?
Come on.
The problem isn't with the small configuration files, those are just argv put into a file.
Here's an experiment actually worth doing: ask ten people to write a ini file for configuring between 3 and 6 servers where some properties are the same for several servers.
It'd generate same set of problems in INI, YAML, TOML, XML, JSON, BICF (bayindirh's imaginary configuration format).
Because these are not related to how you write the file, but how your software operates in your mind.
How the software operates is of course dependent on the expressiveness of the configuration format, so it is clearly false in most practical senses to claim that the flat key-value format of INI and BICF will generate the same set of problems as formats that allows for list and nesting.
If we accept the assertion that the complexity of a configuration file for the stated scenario is constant across all configuration formats, we will next be asserting that there's no difference in complexity between solutions in x86 assembly and LISP.
We're approaching from different sides.
You stated a problem: Configure ~6 servers where they share variables.
I can implement it in plethora of ways. The most sensible one for me is to have a general or globals or defaults area where every server overrides some part of this defaults. The file format has nothing to do with the sectional organization of a configuration file. Because none of the files force you to a distinct section organization.
e.g.: Nesting is just a tool, I don't care about its availability. I don't guarantee that I'll be using if that's available.
I can write equally backwards and esoteric configuration file in any syntax. Their ultimate expressiveness doesn't change at the end of the day.
It can be
or or I don't care. All can do whatever I want and need. Only changes how you parse and map. It's hashmaps, parsing and string matching at the end of the day.If you know both languages equally well, LISP becomes as complex as x86 assembly and x86 assembly becomes as easy as LISP. Depends on your perspective and priorities.
If you don't know how to use the tool you have at hand, even though it's the simplest possible, you blow your foot off.
However they want to.
One may write a single value containing a CSV, another may use a convention of namespaced keys, whatever. One may base64, one may urlencode, whatever.
The differences don't change the fact that they will all have the same things in common.
Even without a formal spec, we all know what we are free to change and not free to change, and free to assume and not free to assume. The unwritten spec specifies very little, so what? That means maybe it isn't a good choice for some particular task that wants more structure, but that was not what you said and not what I'm ridiculing.
Or was that all you meant in the first place? That without some more to it to define standardized ways to do things, it's not good for these kinds of jobs? I confess I am focusing on the literal text of the comment as though you were trying to say that the term is not meaningful because it is not defined in a recognized and ratified paper.
My point is indeed that it is not meaningful to speak of the INI culture as something directly comparable to a standardised format.
I think this is the first time I've seen this sort of neo-romantic argument, where the representation of information is claimed to be irrelevant because, for some unspecified reason, we all known in our hearts what is being said.
Is this a mystical theory you've built on extensively, or something that came to you from the aether just now?
We should ask them instead to modify the existing INI file. I bet most would do just fine.
This is an .ini:
I don't argue. I use TOML too, but it doesn't change that it's an ini++. You can treat an .ini file as a TOML file (well, maybe comments needs some changing, but eh), they're not different things.I don't think, even though TOML has some official spec, all parsers are up to it, and may have disagreements between them. It's same for INI.
You can have "native types" in .ini as well. The difference is you'll be handling them explicitly yourself, and you should do that in defensive programming anway. A config file is a stream of input to your code, and if you don't guard it yourself, you agree what that entails.
Read it on https://toml.io/ (Full spec on upper-right… with its evolutions up to final 1.00 version).
Oh sorry, I missed a comma. It should read: "I don't think, even though TOML has some official spec, ..."
Fixed the comment too.
I know TOML has an official spec.
Zomg how did you magically read my brain to produce a perfect example of what I was thinking even though there is no IEEE spec? It's unpossible!
I used Windows 3.1 and 3.11.
That's all what I'll say.
Overall it's not that bad, see e.g. https://arp242.github.io/toml-test-matrix/
If you look at the failure details then most of them are either minor issues about where things like escape characters are/aren't allowed, or about overriding existing tables (previously the spec was ambiguous on that, and I expect that will clear up over time). Note that overview is not entirely fair because it uses the latest (unreleased) version of toml-test where I added quite a few tests.
These kind of imperfections in implementations are of course true for any language, see e.g. YAML: https://matrix.yaml.info – I have no reason to believe it's worse in TOML vs. YAML, XML, JSON, or anywhere else. If anything, it's probably a bit better because it's fairly simple and has a pretty decent test suite.
The problem goes deeper. I can't remember who coined the term, but all "implerative" (imperative declarative) languages share the same issue. I don't care if it's JSON, XML, TOML, or YAML, we shouldn't be interpreting markup/data languages. GitHub actions are a good example of everything wrong with implerative languages.
Use a real programming language, you can always read in JSON/YAML/whatever as configuration. Google zx is a good example of this done right, as is Pulumi.
Kris Nóva said it best: "All config drifts towards Turing completion."
Oh man, i have a similar issue with NixLang. Though i know it's not "implarative". Many days i just want to write Nix in my preferred language. I wish Nix had made a simple JSON based IO for configuration, because then i could see what the output of something is - and generate the input state from some other language.
Really frustrating. Nix works.. but i just don't see the value, personally. And this is after living on NixOS for ~3 years now, with 4 active Nix deploys in my house.. i just don't like the language.
I'm currently building this (plus more) - the happy path of what you're talking about is almost complete. There are fundamental issues preventing what you're talking about being used as a complete replacement for NixLang: you'd need every possible language installed/available on the builder machine in order to build packages, and lazy evaluation would completely break (merely evaluating all of nixpkgs takes hours). So you do ultimately need a primary language. That being said, for devops-like stuff there is no reason to have that limitation.
Would you be able to link the repo? I'm curious on your impl
Nix can read JSON, there's a deserializer as one of the builtins you can call. So you can make a bridge where Nix reads your JSON and does something with it, and you can generate the JSON externally like you want. It's how things like poetry2nix work.
"Implerative" - thank you for this, this is the term I've been searching for to describe the weird blending of the two things.. I immediately Googled it and saw that it has previous uses as well, I would love to know who originated the concept. I see so many times, confusion and arguing about what is imperative and declarative, to the point where I question the value of the terms any longer.
FWIW, I have flirted with my own DSL implementations in a few cases. Certainly, language design is much more complex, but I also felt that once you understand enough of EBNF/parser generators (and some of the simpler alternatives), this is a very powerful option as well.
I'm also pretty against DSLs, although they do rarely have uses cases. For an example of why DSLs can be bad, look at Dockerfiles contrasted with Buildah. The former makes tons of assumptions, especially when to perform layer checkpoints. The latter is just a script in Bash or whatever your language of choice.
For the curious, this might be it: "I've cracked our marketing code, y'all! Pulumi: Implerative Appfrastructure" [1] @funcOfJoe, Joe Duffy: CEO of Pulumi
[1] https://twitter.com/funcOfJoe/status/1319667607214067712
Also an interesting post referencing the term in a previous comment on HN: https://news.ycombinator.com/item?id=31182790
I've always wondered why we seem to have implemented a whole programming language in yaml or json for so many CI/CD systems rather than just writing quick python scripts to describe the logic of a particular build step, then MAYBE using a JSON or XML file to enumerate the build steps and their order, like:
Sure, that's orchestration, though. The problem with GHA is the sheer amount of expressive power that it has. If you need to do dynamic stuff then that should be in a "pre-workflow" step, written however/in whatever you please, that emits the actual workflow.
Why shouldn't the python script be the discrete workflow step? It could be mounted on some file system which has checked out the git at a particular commit with a particular tag, then runs whatever tasks are required to validate or deploy the project
I agree with all of this.
If we take it one step further though and think about portability of configuration, I think that is one of the reasons we end up with operators.
For tools that allow configuration in either JSON or Javascript (like eslint), I prefer the JS version. The syntax is similar but has much more flexibility, like being able to use environment variables or add comments.
Pulumi was also a good tool when I was doing kubernetes deployments.
Norway is also "False".
Or more precisely, its country code 'NO' is false. I don't think there are any YAML parsers that parse the literal string 'Norway' as false.
Be the change you wish to see in the world.
I would support a move for YAML to standardize on both "NO" and "Norway" evaluating to false. It seems an obvious win for consistency.
Surely it should accept either "Norway", "Norge" or "Noreg" depending on the locale setting.
Hmmmmmm. In that case the "nodding head" emoji should evaluate to false when the locale is set to Bulgarian...
It’s very obvious that’s what he means.
shrug it wasnt obvious to me. I'm glad someone explained.
It wasn’t obvious to me. I read it as the literal string “Norway” being parsed as false, which didn’t sound believable but I didn’t make the connection to NO at all.
The YAML 1.2 spec removed “no” as a synonym for false. That arguably just made that entire problem worse, and even though it’s been almost 15 years YAML 1.1 is still the commonly used variant.
Ah, that explains why I couldn’t find any online YAML->JSON converters that would demonstrate this flaw when it came up a few weeks ago.
So now we have the same language that parses the same document subtly differently depending on what version you use. Hooray?
Maybe one written by a Geordie?
And yet it doesn't recognize that the UK is false[0].
[0] https://en.wikipedia.org/wiki/Perfidious_Albion
In general I am always confused that it lets you use strings unquoted, which is what allows for all these issues with ambiguity of the interpreted data type, Norway problem and all that.
It also just looks odd to me, I don't see why it's necessary to allow this.
It’s great for end users who don’t understand what a string is or don’t have to play the game of finding the hanging single quote when they write the file by hand in a textarea.
On the opposite end of UX, there’s hand written JSON which is just too meticulous in some scenarios when people are writing config without editor support.
That’s probably a good thing for end users but if it’s running on something that affects the live service I’d rather not have people edit the config who don’t know what a string is
Dealing with inline quotes is annoying, but if you care about users writing things by hand, and especially in a textarea, you should not be using a format that depends on indentation.
It’s because YAML is designed first for readability.
YAML is an amazing config language for simple to mildly complex configs. It's easier to read and write than JSON, and it only really breaks apart when you're heavily deviating from nested lists/dictionaries with string values. People use it everywhere because by the time it becomes painful you're already so invested it's not really worth the hassle of switching.
I, on the other hand, find it much harder to read and write even in very simple configs. I never know what the indent is supposed to be, I just press my spacebar until my editor stops complaining. I find it really hard to tell if a line is a new entry or a subset of the parent entry.
I'm sure if I used it more it'd become easier, but my whole team doesn't understand it either. Luckily we only need it for GitHub configurations.
YAML is (vaguely) a superset of JSON, so you can just use JSON (without tabs) and get your life back.
I don’t need a config language with no fewer that 6 subtly different ways of decoding a string to remember, and certainly not one with a spec longer than C’s. Compare to JSON’s, which (famously) fits on a postcard.
https://yaml.org/spec/1.2.2/
https://yaml-multiline.info/
https://www.json.org/json-en.html
Until you find a snippet of config you want to copy into your `application.yml` in Spring or Quarkus (Java frameworks). If it doesn't paste in cleanly (and it rarely ever does) you'll need to go research the schema and find out where to put things. Meanwhile, if you're using a normal `application.properties` file, after you've finished pasting, you can go on with your life.
It’s aesthetically pleasing for simple configs. I’m so used to writing JSON by hand by now I don’t find it much easier. At least I never have to think about how a value is going to be interpreted from a JSON since it has a decent subset of types and I can visually tell what it is
Does anyone know which format Git uses? Is it YAML? Or TOML? Or something in between?
Git uses its own ini-style conf format which diverges from TOML.
TOML wasn't even invented/specified when git came in to being.
But it looks like YAML was.
I wonder if we would even be using YAML or TOML to the degree we are now if JSON had support for trailing commas and comments.
JSON5 has both of these
I can't find any JSON5 parser that isn't for JavaScript. I've started writing one in C that can then bind to other language, but it takes time to write!
Having on and no both be Boolean literals, but of opposite values sounds like a horrible decision, a typo doesn't result in a syntax error, but instead in a completely wrong semantic misconfiguration.
1 vs 0 is another typo of Boolean values with opposite meanings, in quite a few languages.
Worse, the ability to typo 9 in some languages for 0 and flip your boolean when it should be a type error just seems like a misdesign.
Personally I prefer INI over nearly all configuration formats.
https://github.com/madmurphy/libconfini/wiki/An-INI-critique...
I have seen this post on HN before and I wasn't received very well AFAIR.
But I can't help agreeing with its main point: so much complexity to support a few basic data types that are not sufficient for anything complex anyway.
If you haven't checked it out, NestedText is a great format that offers no handling of types beyond string/list/dict, leaving all that to the application reading in the values.
No character needs escaping.
stronly agree, I came to that conclusion before k8's even existed because I myself thought to use it as a configuration file format and the second I started realizing some of the ambiguity in it's syntax I walked away from it.
The only thing I disagree with is that Coffeescript is still useful. I had the same reaction to Coffeescript that I had with yaml, Coffeescript _never_ had any real point outside of a segment of people preferring to write javascript in Ruby syntax. The biggest issue Coffeescript had is that debugging meant reading through the javascript anyway so you never really got away from javascript.
I'm a fan of either using a full-blown programming language or ini files, and yes I realize that seems insane to many people but at the end of the day ini files are stupidly easy to edit and if you can get away with not needing a full-blown turing complete language then convention based ini files are vastly easier on the human than yaml or json.
I'm either a greybeard that never got with the times or I'm a rebel, probably depends on who you talk to.
How do you persist complex multi-object state? Think nested lists of objects with references to one another.
If your answer is still "ini files", I'm sure it can be done, but only with a lot of custom-rolled code...xml/json(even yaml) for all their issues provided a code-free way of persisting this all - either through use of marshalling (xml) or json/yaml.load().
you cut off the part of my statement that answers your question
My claim isn't that ini files solve for every use case, it's that if your needs are simple enough ini files are superior to json/yaml, but that full-blown turing complete languages are superior to everything else.
Also, if you're saving complex object state you don't have a configuration format but a serialization format and definitely ini isn't good for that.
HJson https://hjson.github.io seems a nice 'in-between' between YAML and JSON without the indentation-based syntax, so closer to the JSON side but with comments and less quotes.
What I don't really get is why the cloud providers / tooling implementors have never drafted up a "YAML-light" that just throws out the rarely-used headache-inducing syntax elements.
Hjson is pretty nice.
Two YAML-light style projects are StrictYAML (a Python library), and NestedText (an alternative spec with only string, list, and dict).
StrictYAML solves most if not all of these problems https://hitchdev.com/strictyaml/features-removed/
StrictYAML is great (and the author is in these comments!), but ultimately it's one specific library, not a format spec, so to depend on it for a project you need every person/tool doing the writing/parsing to commit to use that library (and the programming language it was written for).
Again, it's a great project, but I wanted something similar that is a language-agnostic format specification, so moved on to using NestedText wherever I can.
”on”, ”off”, ”yes”, ”no”, “y”, and ”n”, and case variants thereof, are not boolean literals in YAML since YAML 1.2 (2009).
As far as I know, not even libyaml supports 1.2. What YAML parsing libraries support 1.2?
I guess the real mystery is why so many tech types speak like a infant having a tantrum, about some esoteric trivia, and then have hordes of their kind come and vigorously head-knod it, and all involved think virtue is being done.
People started using things like YAML, obviously, because it reads closer to natural language. It's like a nested bullet list, which everyone can easily read. Readability is important to people. It's why we don't all still write C and Perl.
So it's one thing to say "I think people should be careful about prioritizing readability over precision especially for production systems". It's another to do this narcissistic dramatic faux-incomprehension implying the markup language gained the popularity it did because everyone's stupider than you.
Ha, great line. And you caught me mid-tantrum and mid-head nod. :)
We're still using the CoffeeScript of JSON because YAML's UX improvments haven't been brought into the upstream JSON spec like CoffeeScript's UX improvements were brought into JavaScript.
Right, I also don't understand why it's considered a feature of many of these languages to introduce so many ways of doing the same thing. Like the boolean example, but also having three different ways to express a list or dictionary? It's the classic Robustness principle which makes it less robust, making reading and parsing more complicated. How about just allowing one syntax and error if it's not according to spec.
YAML is fine if you don't do weird stuff with it. (And some stupidity like the Norway problem) A good example is OpenAPI schemas, which are quite legible in the YAML.
TOML has some nasty edge cases like top level arrays, arrays of objects under a key, etc.
Obligatory: https://github.com/edn-format/edn
I've had few to no issues when using YAML for docker-compose.yml files. This isn't to say that use of YAML can't be problematic, but I don't believe it's necessarily bad at all for configuration.
That's a valid use case when the target user is the software developer themself, but access to the language runtime is not something that should be accessible to a technical but non-maintainer user. Granted, it's plausible that a "template" JSON can be defined, which would be spread over a JSON-formatted configuration, but what YAML allows the user to do is define "templates" within the configuration itself and control over where those template structures are extended.
When the user is a developer maintaining a software project, they should probably just use JavaScript for configuration, and not JSON files, except when there's a possibility that the configuration can be intercepted.
This is all because people refuse to use JSON parsers that allow comments
The answer is simple: JSON doesn’t have comments, XML isn’t human writable, and TOML isn’t well-supported by common tooling.
Oh toml is atrocious and it’s a nightmare trying to understand nesting with all those repeated keys and double brackets.
CoffeeScript is the worst thing that ever happened to the software industry.
CoffeeScript fooled developers into thinking that transpilation was free and had absolutely no downsides whatsoever. The advantages of CoffeeScript over JavaScript were so incredibly marginal. I've never heard a single good argument about why it was worth adding a transpilation step and all the complexity that came with it.
I think even TypeScript isn't worth transpilation step and bundling complexity these days, especially not when modern browsers allow you to efficiently preload scripts as modules and bypass bundling entirely.
About YAML. It's also not worth it though it's not quite as infuriating as CoffeeScript. The advantage of JSON is that it's equally as human-friendly as it is software-friendly. YAML leans more towards human-friendliness and sacrifices software friendliness. For instance, you can't cleanly express YAML on a single line to pass to a bash command as you can with JSON. It's just one additional format to learn and think about which doesn't add much value. Its utility does not justify its existence.
If support for JSON with comments was more widely available / in use, we'd use that. But it's not, so we don't.
I'm still using CoffeeScript whenever I can. It has one of the nicest syntaxes out there, a lot of code fits to one screenful, the logic of the code is easier to see without the clutter of unnecessary syntax and it's a joy to write too.
YAML is probably used for similar reasons.
I don't understand why people want redundant verbose syntax that makes reading and writing code harder. And sadly don't anymore expect anyone to really explain it based on anything tangible.
YAML, or TOML or Json, I think the more problematic issue is using templating at all instead of generating them.
YAML is a far better format in terms of being human readable and editable, and supports features such as node labels and repeated nodes that turn into killer features when onboarding YAML parsers into applications.
This is optional. Besides using a better parser that uses the spec that's long fixed a lot of these listed in the article, another way to avoid the issue is adding more verbosity (that would still not match XML nor JSON).
You don't have this option in XML/JSON, you can't remove all that useless markup (and leave it only when it's useful)
Because that's what humans use to denote booleans
I wish jdon5.org were adopted more widely, it's JSON,safe because it cannot be executed, but with comments and trailing comma!
Note: YAML is a superset of JSON, which means that any YAML reader can read JSON.
XML also has some other issues (no typing, to many ways to have maps but non seems to be the correct way etc.)
JSON just isn't mean to be written by humans (no comments).
But YAML is just horrible, like the whole accidental mistyping issues (NO => false) are just horrible and not acceptable IMHO. That it's a pretty complex thing doesn't help either.
I honestly don't understand why we (e.g. github actions) still use YAML for new thinks even knowing all the issues especially if we, there are many other well suited decent but less wide spread alternatives.
YAML is older and more well supported. I'll explain to you why I ended up choosing YAML for the config files for a CLI utility written in Python that I maintain.
I initially chose TOML for many of the reasons mentioned here but before my first release I ended up switching to YAML. Python added support for reading TOML to the standard library in version 3.11, however it still requires you use an external library for writing. Do I use the built in library for reading and an external library for writing? A chunk of my users are on versions of Python older than 3.11 (generally Windows users who installed Python manually at some point), do I import a separate library for THEM to read the files but use the standard library if ver >= 3.11?
Now that I look at the state of things today I probably would add the tomlkit library to my setup file, but that wasn't very mature at the time, so I just used pyyaml. Changing it now would break compatibility with my older versions that use yaml config files, unless I maintained both paths... which I could do but it's just another source of complexity to worry about. These are relatively simple config files the user has to interact with manually so yaml works fine and I don't see any reason to change at this point.
Gasp! Does this mean you know why people are still using CoffeeScript?
Well, you can replace YAML with JSON and JS templating without changing the parser. So I guess that’s an advantage over TOML?
I think YAML is for code what Markdown is for Text: It is easy to read and _can_ produce the same or equal output that more strict and extensive languages. Easy readability makes this tradeoff acceptable for most.
For everyone who hates YAML, we extend YAML to another use.
It’s like without rules none of you show any common sense. Who cares what the spec says? Obviously you shouldn’t use “oN” as boolean true.