The OCI Distribution Spec is not great, it does not read like a specification that was carefully designed.
According to the specification, a layer push must happen sequentially: even if you upload the layer in chunks, each chunk needs to finish uploading before you can move on to the next one.
As far as I've tested with DockerHub and GHCR, chunked upload is broken anyways, and clients upload each blob/layer as a whole. The spec also promotes `Content-Range` value formats that do not match the RFC7233 format.
(That said, there's parallelism on the level of blobs, just not per blob)
Another gripe of mine is that they missed the opportunity to standardize pagination of listing tags, because they accidentally deleted some text from the standard [1]. Now different registries roll their own.
[1] https://github.com/opencontainers/distribution-spec/issues/4...
That’s par for everything around Docker and containers. As a user experience Docker is amazing, but as technology it is hot garbage. That’s not as much of a dig on it as it might sound: it really was revolutionary; it really did make using Linux namespaces radically easier than they had ever been; it really did change the world for the better. But it has always prioritised experience over technology. That’s not even really a bad thing! Just as there are tons of boring companies solving expensive problems with Perl or with CSVs being FTPed around, there is a lot of value in delivering boring or even bad tech in a good package.
It’s just sometimes it gets sad thinking how much better things could be.
I don’t know about that (hyperbole aside). I’ve been in IT for more than 25 years now. I can’t see that Docker container actually delivered any tangible benefits in terms of end-product reliability or velocity of development to be honest. This might not necessarily be Dockers fault though, maybe it’s just that all the potential benefits get eaten up by things like web development frameworks and Kubernetes.
But at the end of the day, todays Docker-based web app development delivers less than fat-client desktop app development delivered 20 years ago, as sad as that is.
If you haven’t seen the benefits, you’re not in the business of deploying a variety of applications to servers.
The fact that I don’t have to install dependencies on a server, or set up third-party applications like PHP, Apache, Redis, and the myriad of other packages anymore, or manage config files in /etc, or handle upgrades of libc gracefully, or worry about rolling restarts and maintenance downtime… all of this was solvable before, but has become radically easier with containers.
Packaging an application and its dependencies into a single, distributable artifact that can be passed around and used on all kinds of machines was a glorious success.
I’m aware of all of that, I’m just saying that this has not translated into more reliable and better software in the end, interestingly enough. As said, I’m not blaming Docker, at least not directly. It’s more that the whole “ecosystem” around it seems to have so many disadvantages that in the end overweigh the advantages of Docker.
It has translated to reliable legacy software. You can snapshot a piece of software, together with its runtime environment, at the point when it's still possible to build it; and then you can continue to run that built OCI image, with low overhead, on modern hardware — even when building the image from scratch has long become impossible due to e.g. all the package archives that the image fetched from going offline.
(And this enables some increasingly wondrous acts of software archaeology, due to people building OCI images not for preservation, but just for "use at the time" — and then just never purging them from whatever repository they've pushed them to. People are preserving historical software builds in a runnable state, completely by accident!)
Before Docker, the nearest thing you could do to this was to package software as a VM image — and there was no standard for what "a VM image" was, so this wasn't a particularly portable/long-term solution. Often VM-image formats became unsupported faster than the software held in them did!
But now, with OCI images, we're nearly to the point where we've e.g. convinced academic science to publish a paper's computational apparatus as an OCI image, so that it can be pulled 10 years later when attempting to replicate the paper.
I think you’re onto part of the problem here. The thing is that you have to snapshot a lot of nowadays software together with its runtime environment.
I mean, I can still run Windows software (for example) that is 10 years or older without that requirement.
The price for that kind of backwards compatibility is a literal army of engineers working for a global megacorporation. Free software could not manage that, so having a pragmatic way to keep software running in isolated containers seems like a great solution to me.
There’s an army of developers working on Linux as well, employed by companies like IBM and Oracle. I don’t see a huge difference to Microsoft here to be honest.
What are you even talking about? Being able to run 10 year old software (on any OS) is orthogonal to being able to build a piece software whose dependencies are completely missing. Don't pretend like this doesn't happen on Windows.
Im not sure I would agree here: from my personal experience, the increasing containerisation has definitely nudged lots of large software projects to behave better; they don’t spew so many artifacts all over the filesystem anymore, for example, and increasingly adopt environment variables for configuration.
Additionally, I think lots of projects became able to adopt better tooling faster, since the barrier to use container-based tools is lower. Just think of GitHub Actions, which suddenly enabled everyone and their mother to adopt CI pipelines. That simply wasn’t possible before, and has led to more software adopting static analysis and automated testing, I think.
This might all be true, but has this actually resulted in better software for end users? More stability, faster delivery of useful features? That is my concern.
Circa 2005 I was working at places where I was responsible for 80 and 300 web sites respectively using a large range of technologies. On my own account I had about 30 domain names.
I had scripts that would automatically generate the Apache configuration to deploy a new site in less than 30 seconds.
At that time I found that most web sites have just a few things to configure: often a database connection, the path to where files are, and maybe a cryptographic secret. If you are systematic about where you put your files and how you do your configuration running servers with a lot of sites is about as easy as falling off a log, not to mention running development, test, staging, prod and any other sites you need.
I have a Python system now with gunicorn servers and celery workers that exists in three instances on my PC, because I am disciplined and everything is documented I could bring it up on another machine manually pretty quickly, probably more quickly than I could download 3GB worth of docker images over my ADSL connection. With a script it would be no contest.
There also was a time I was building AMIs and even selling them on the AMZN marketplace and the formula was write a Java program that writes a shell script that an EC2 instance runs on boot, when it is done it sends a message through SQS to tell the Java program to shut down and image the new machine.
If Docker is anything it is a system that turns 1 MB worth of I/O into 1 GB of I/O. I found Docker was slowing me down when I was using a gigabit connection, I found it basically impossible to do anything with it (like boot up an image) on a 2MB/sec ADSL connection, with my current pair of 20MB/s connections it is still horrifyingly slow.
I like how the OP is concerned about I/O speed and bringing it up and I think it could be improved if there was a better cache system (e.g. Docker might even work on slow ADSL if it properly recovered from failed downloads)
However I think Docker has a conflict between “dev” (where I’d say your build is slow if you ever perceive yourself to be waiting) and “ops” (where a 20 minute build is “internet time”)
I think ops is often happy with Docker, some devs really seem to like it, but for some of us it is a way to make a 20 sec task a 20 minute task.
Being able to create a portable artifact with only the userspace components in it, and that can be shipped and run anywhere with minimal fuss is something that didn't really exist before containers.
Java?
25 years ago I could tell you what version of every CPAN library was in use at my company (because I installed them). What version of what libraries are the devs I support using now? I couldn't begin to tell you. This makes devs happy but I think has harmed the industry in aggregate.
Because of containers, my company now can roll out deployments using well defined CI/CD scripts, where we can control installations to force usage of pass-through caches (GCP artifact registry). So it actually has that data you're talking about, but instead of living in one person's head it's stored in a database and accessable to everyone via an API.
You mean, aside from not having to handle installation of your software on your users' machines?
Also I'm not sure this is related to docker at all.
Was it really so amazing? Here is half a Docker implementation, in about 100 lines of Bash...
https://github.com/p8952/bocker
The other half is the other 90%.
Looking at it now, it won't even run in the latest systemd, which now refuses to boot with cgroups v1. Good luck even accessing /dev/null under cgroups v2 with systemd.
Lines of code is irrelevant.
Docker is important because:
1) it made a convenient process to build a “system” image of sorts, upload it, download it, and run it.
2) (the important bit!) Enough people adopted this process for it to become basically a standard
Before Docker, it wasnt uncommon to ship some complicated apps in VMs. Packaging those was downright awful with all of the bespoke scripting needed for the various steps of distribution. And then you get a new job? Time to learn a brand new process.
And like the famous hacker news comment goes, Dropbox is trivial by just using FTP, curlftpfs and SVN. Docker might have many faults, but for anybody that dealt with the problems that it aimed to solve do know in that it was revolutionary in simplifying things.
And for people that disagree, please write a library like TestContainers using cobbled together bash scripts, that can download and cleanly execute and then clean up almost any common use backend dependency.
Yeah, but the Open Container Initiative is supposed to be the responsible adults in the room taking the "fail fast" corporate Docker Inc stuff, and taking time to apply good engineering principles to it.
It's somewhat surprising that the results of that process are looking to be nearly as fly-by-the-seat-of-your-pants as Docker itself is.
I mean, Podman exists, as do lots of custom build tools and other useful options. Personally, I mostly just stick with vanilla Docker (and Compose/Swarm), because it's pretty coherent and everything just fits together, even if it isn't always perfect.
Either way, agreed about the concepts behind the technology making things better for a lot of folks out there, myself included (haven't had prod issues with mismatched packages or inconsistent environments in years at this point, most of my personal stuff also runs on containers).
On top of that, it's either the OCI spec that's broken or it's just AWS being nuts, but unlike GitLab and Nexus, AWS ECR doesn't support automatically creating folders (e.g. "<acctid>.dkr.ecr.<region>.amazonaws.com/foo/bar/baz:tag"), it can only do flat storage and either have seriously long image names or tags.
Yes you can theoretically create a repository object in ECR in Terraform to mimic that behavior, but it sucks in pipelines where the result image path is dynamic - you need to give more privileges to the IAM role of the CI pipeline than I'm comfortable with, not to mention that I don't like any AWS resources managed outside of the central Terraform repository.
[1] https://stackoverflow.com/questions/64232268/storing-images-...
That's seem standard AWS practice. Implement a new feature so you can check the box, but in practice it's a huge pain to actually use.
It’s not as bad as Azure is (was?) with IPv6: https://news.ycombinator.com/item?id=29327773
Azure's IPv6 implementation is still flawed and still broken. That has not changed.
IIRC it's not in the spec because administration of resources is out of scope. For example, perhaps you offer a public repository and you want folks to sign up for an account before they can push? Or you want to have an approval process before new repositories are created?
Regardless it's a huge pain that ECR doesn't support this. Everybody I know of who has used ECR has run into this.
There's a long standing issue open which I've been subscribed to for years now: https://github.com/aws/containers-roadmap/issues/853