I think this might be a step in the right direction, but my main problem with Kubernetes package management today might not be fixable by a package manager, sadly. The biggest issue I have in my daily life is handling the multiple levels of nested YAML and the unpredictability of the results.
Think of an ArgoCD ApplicationSet that generates a bunch of Applications. Those Applications render a bunch of Helm charts, and inside those charts there are CRDs used by some random operator like Strimzi, Grafana or Vector.
Given YAML's lack of syntax and the absense of any sort of standard for rendering templates, it's practically impossible to know what are the actual YAML being injected in the Kubernetes API when you make a top-level change. It's trial and error, expensive blue-green deployments and hundreds of debugging minutes all the way, every month.
The wide adoption of YAML for devops adjacent tooling was a mistake.
I think proper programming language support is the way to go.
Ideally a static type system that isn't turing complete and guaranteed to terminate. So something like starlark with types.
I have been developing my own package manager, and my core idea is that proper programming languages are the proper level for describing packages.
Programs take inputs and can output arbitrary data such as resources. However they can do so with type safety, and everything else a programming ecosystem can achieve.
For asset distribution it uses wasm, and that's it!
If you want to check it out its here: github: (https://github.com/davidmdm/yoke) docs: (https://davidmdm.github.io/yoke-website)
I like that you said: > I think proper programming language support is the way to go.
I think we need to stop writing new ways of generating yaml since we already have the perfect way of doing so. Typed languages!
This means general purpose languages do not qualify, and more generally, no general recursion
Why limit yourself to those types of tools?
To protect against somebody writing a non-terminating program?
General programming languages come with a lot of general purpose benefits from their ecosystems like package managers npm, cargo, go modules, etc.
They have test runners, and control flow.
Lots of them already have type definitions for kubernetes and if you are working in Go you have access to almost the entire kubernetes ecosystem.
Maybe we are throwing the baby out with the bath water when we disqualify general purpose languages?
Because people write code that is hard to understand. Configuration doesn't need all that. What it needs is to be provably correct and easy for someone to make predictable changes under high pressure (when prod is down). The non-terminating thing is one of the features of a turing incomplete language, not the goal. You don't want inheritance either, because it becomes hard to know where and when a value gets set (which is what helm overlay via multiple -f uses effectively is)
You speak like turning incomplete languages cannot have the control structures, tooling, and ecosystems we enjoy elsewhere, which would be the wrong assessment. I recommend you take a look at CUE to see how this can be true
The OpenAPI specs are probably better than the Go language types for k8s. They have more of the validation information and you can get at the CRDs / versions actually running in the cluster.
I am not saying that Turing incomplete languages don’t or can’t be a good fit for this task.
However there’s no reason we should rule out general purpose languages.
We have a lot of configuration based IaC and configuration tooling a la jsonnette and cue and yet these are riddled with their own problems and DX issues.
Anyways we don’t need to see eye to to eye on this but I respect your position.
We’ve learned the hard way general purpose languages are poor for configuration at scale. I know first hand having worked on some of the larger prod infrastructures out there.
At scale, the best SRE’s out there still have trouble reasoning about the system and end up pushing bad config that takes down prod.
Languages like CUE really are different and better. CUE in particular hits the right balance for configuration of millions of lines of k8s yaml.
I actually really like CUE. I use it kind of extensively as a YAML replacement where I can, and at my work we've done our best to integrate cue with our charts to validate the values used to invoke our charts and to unify the value space.
However there's something about a full blown general purpose language that is so much more flexible.
I don't think that the fact that people can and do write bad programs disqualifies general purpose languages from being great tools to build packages.
I am sure there is just as equally bad CUE, Jsonette, PKL, etc out there.
Other than CDK8s I don't know of other tools that have tried in this space to use general purpose languages to define their packages, and I think CDk8s uses are generally happy. Much more so than helm users at least.
I am not sure I can agree with this statement > We’ve learned the hard way general purpose languages are poor for configuration at scale
I think we've just assumed this, or seen a pulumi project we didn't like working in.
I believe and hope there will be plenty of room to experiment and innovate in this space!
The difference is in shared understanding. With tools like CUE or Starlark, you can learn one system and everyone can reason across each other's work. With imperative languages, every instance is a snowflake and creates significant mental overhead. It's the flexibility that is actually the problem for configuration. I get there is/was a trend towards general purpose languages in DevOps, but I think we are post peak on the adventure.
CUE was created by the same person who wrote the Borg precursor and also worked on BCL & GCL.
In addition it is actually hard to not make a template language accidentally Turing complete. Here is an entertaining list of accidentally Turing complete things: https://beza1e1.tuxen.de/articles/accidentally_turing_comple...
The idea of declarative config is that empirically programmatic config was bad at scale.
If your config is the source of truth of what your infra should be, then you can use source control tools to roll back to a known good state, or to binary search for when a problem was introduced.
If you use programmatic config, then you can't find out the intended state of your system without executing a program. You can't grep through program executions in a meaningful way, especially at scale. So you can't do even simple things like search for a string.
Guaranteeing termination is helpful, but it doesn't solve the main problem that programmatic config puts a large complexity barrier between you and the ability to understand your infrastructure.
Tools like Helm give up a fair amount of this declarative benefit. And IMO that's one of the reasons why it's almost always a worse experience to use a helm chart than to just render the chart once and for all and forget Helm ever existed.
If you restrict your language to pure functions only, then it is quite possible to have a system be both declarative and reproducible while having more expressivity than yaml.
and indeed, this is the approach that config-centric languages like Nickel[0] take.
[0]: https://nickel-lang.org/
The priority field in Nickel seems a lot like CSS weighting, though more explicit, I suspect it will cause headaches at scale.
Languages can be declarative or imperative. For instance, Pulumi and CDK are declarative.
I don’t understand — nothing stops a language from having an intermediate compilation step that shows the intended state and is searchable. Beyond that, programmatic config means you can add in hooks to make plans or runs more interrogatable.
It seems like this is untrue — having seen templated IaC that is hundreds of thousands of lines and cdk that defers that complexity to an abstraction that I have to understand once, I’d always take the latter.
Agreed that helm use is a giant mistake and liability.
I think codegen/compilation is a middle ground here. A higher level language like starlark can be compiled down to a set of instructions that provide the described guarantees.
This is how Pants (build system) works. You have declarative Starlark which supports basic programming semantics and this generates a state the engine reads and tries to produce.
I've been meaning to dive into jsonnet for a while but it'd be good to have a higher level representation that didn't rely on sophisticated templating and substitution engines like current k8s.
Compare k8s to Terraform where you have modules, composability, variables. These can be achieved in k8s but you need to layer more tooling on (kustomize, helm, etc). There could be a richer config system than "shove it in YAML"
Things like explicit ordering and dependencies are hard to represent in pure yaml since they're ",just text fields" without additional tools
Have you looked at CUE? (https://cuelang.org/docs/concept/the-logic-of-cue/)
CUE is also pragmatic in that it has integrations with yaml, json, jsonschema, openapi, protobuf
I've tried out Pkl which is similar in spirit, and I think it's a real solution for k8s manifests. The only thing holding it back is industry adoption imo. It's leagues better than Helm, and mostly better than Kustomize.
See also: KCL, which is very similar and might _actually_ be "the winner". Time will tell.
I don't expect a winner personally, rather that there will be dozens of alternatives always. Like build systems, deployments are quite bespoke to organizations and legacy has a way of sticking around for a long time
Having used CUE, mainly outside of Kubernetes, I cannot see myself switching to KCL. I really like having a configuration language that isn't so tied to a specific system and which I can use with the Go SDK
So whats your take on https://github.com/stripe/skycfg do you also have experience with it?
No, I went with CUE instead of Starlark
My take on this is that the issue is not declarative infrastructure resources, but a tendency to over-complicate the infrastructure.
For example: You have a problem that is suitable for some message queue -> Apache Kafka. Now you have 7 new problems and the complexity warrants perhaps 3 other services, and on, and on.
Introducing complexity is always something that needs to be introduced carefully. It makes things harder if you introduce it to early, but everything will break in a big bang if you introduce it too late.
Nowadays you can also start with a light weight MQ like Rabbit MQ and decouple your service just into a hand full components. This will set you up for scalability without introducing massive overheads.
In end it is also always a knowledge game. How experienced are you or how much time are you willing to invest into learning and understanding a technology.
this exists for k8s[0]. there have been other users based on the same library[1], I heard reddit did something similar internally
[0] - https://github.com/cruise-automation/isopod [1] - https://github.com/stripe/skycfg
If I was in charge of our infra automation I would have done this. We opted for jsonnet instead which is an absolute nightmare, or at least the way we've set it up is.
Yelling At My Laptop
cdk8s + TypeScript is my favorite option.
Here's how I use it: https://github.com/shepherdjerred/homelab/tree/main/cdk8s
https://github.com/cdk8s-team/cdk8s
Personally, I would prefer a SQLite database. Ok I'll show myself out.
I think you can use Pulumi for helm? Or maybe just straight up kube.
Have you tried the rendered manifests pattern ? https://akuity.io/blog/the-rendered-manifests-pattern/
While I agree generally with the pattern (dynamically generating manifests, and using pipelines to co-ordinate pattern change), I could never quite figure out the value of using Branches instead of Folders (with CODEOWNER restrictions) or repositories (to enforce other types of rules if needed).
I can't quite put my finger on it, but having multiple, orphaned commit histories inside a single repository sounds off, even if technically feasible.
I believe the idea is that it makes it very explicit to track provenance of code between environments, eg merge staging->master is a branch merge operation. And all the changes are explicitly tracked in CI as a diff.
With directories you need to resort to diffing to spot any changes between files in folders.
That said there are some merge conflict scenarios that make it a little annoying to do in practice. The author doesn’t seem to mention this one, but if you have a workflow where hotfixes can get promoted from older versions (eg prod runs 1.0.0, staging is running 1.1.0, and you need to cut 1.0.1) then you can hit merge conflicts and the dream of a simple “click to release” workflow evaporates.
That isn't quite my understanding - but I am happy to be corrected.
There wouldn't be be a staging->main flow. Rather CI would be pushing main->dev|staging|prod, as disconnected branches.
My understanding of the problem being solved, is how to see what is actually changing when moving between module versions by explicitly outputting the dynamic manifest results. I.e. instead of the commmit diff showing 4.3 -> 5.0, it shows the actual Ingress / Service / etc being updated.
Couldn't you just review the Commit that instigated that change to that file? If the CI is authoring the change, the commit would still be atomic and contain all the other changes.
Yeah 100%.
In either case, I'm not saying it's wrong by any stretch.
It just feels 'weird' to use branches to represent codebases which will never interact or be merged into each other.
Sorry, typo, you’re quite right, I meant to say staging->prod is a merge. So your promotion history (including theoretically which staging releases don’t get promoted) can be observed from the ‘git log’. (I don’t think you want to push main->prod directly, as then your workflow doesn’t guarantee that you ran staging tests.)
When I played with this we had auto-push to dev, then click-button to merge to staging, then trigger some soak tests and optionally promote to prod if it looks good. The dream is you can just click CI actions to promote (asserting tests passed).
In general though a release will have tens or hundreds of commits; you also want a way to say “show me all the commits included in this release” and “show me the full diff of all commits in this release for this file(s)”.
Yeah, I like some conceptual aspects of this but ultimately couldn’t get the tooling and workflow to fit together when I last tried this (probably 5 years ago at this point to be fair).
I might be misunderstanding what you mean by staging in this case. If so, my bad!
I don't think staging ever actually gets merged into prod via git history, but is rather maintained as separate commit trees.
The way that I visualised the steps in this flow was something like:
In that model, there isn't actually ever a merge conflict that can occur between staging and prod, because you're not dealing with merging at all.The way you then deal with a delta (like ver 1.0.1 in your earlier example) is to create a PR directly against the Prod branch, and then next time you do a full release, it just carries out the usual process, 'ignoring' what was there previously.
It's basically re-invented the terraform delta flow, but instead of the changes being shown via Terraform by comparing state and template, it's comparing template and template in git.
I genuinely feel like this is the bane of most tooling in this space. Getting stuff from 'I can run this job execution on my desktop', to 'this process can scale across multiple teams, integrated across many toolchains and deployment environments, with sane default' still feels like a mess today.
edit: HN Formatting
Glad I am not the only one feeling "weird" about the separate branches thing :D
Probably just a matter of taste, but I think having the files for different environments "side by side" makes it actually easier to compare them if needed, and you still have the full commit history for tracking changes to each environment.
This pattern is powerful since you can pick arbitrary tooling and easily make modifications with your own tooling. For instance substituting variables/placeholders or applying static analysis.
I read this article a while ago and it seems like the most sane way of dealing with this. Which tool you use to render the manifests doesn't even matter anymore.
Interesting, we have a system (different context, though it does use yaml) that allows nested configurations, and arrived at a similar solution, where nested configs (implicit/human interface) are compiled to fully qualified specifications (explicit/machine interface). It works quite well for managing e.g. batch configurations with plenty of customization.
I was unaware there was a name for this pattern, thank you.
This is actually a problem we want to focus on with Glasskube Cloud (https://glasskube.cloud/) where our glasskube[bot] will comment on your GitOps Pull request with an exact diff of resources that will get changed across all connected clusters. This diff will be performed by controller running inside your cluster.
Think of it as codecov analysis, but just for resource changes.
This sounds like terraform. Is this TF for k8s?
That's an interesting analogy, but I thinks it's a stretch.
IMO the pull request should be the diff.
https://akuity.io/blog/the-rendered-manifests-pattern/
The solution is to not use Kubernetes.