return to table of content

Using S3 as a Container Registry

stabbles
28 replies
9h35m

The OCI Distribution Spec is not great, it does not read like a specification that was carefully designed.

According to the specification, a layer push must happen sequentially: even if you upload the layer in chunks, each chunk needs to finish uploading before you can move on to the next one.

As far as I've tested with DockerHub and GHCR, chunked upload is broken anyways, and clients upload each blob/layer as a whole. The spec also promotes `Content-Range` value formats that do not match the RFC7233 format.

(That said, there's parallelism on the level of blobs, just not per blob)

Another gripe of mine is that they missed the opportunity to standardize pagination of listing tags, because they accidentally deleted some text from the standard [1]. Now different registries roll their own.

[1] https://github.com/opencontainers/distribution-spec/issues/4...

eadmund
22 replies
7h39m

The OCI Distribution Spec is not great, it does not read like a specification that was carefully designed.

That’s par for everything around Docker and containers. As a user experience Docker is amazing, but as technology it is hot garbage. That’s not as much of a dig on it as it might sound: it really was revolutionary; it really did make using Linux namespaces radically easier than they had ever been; it really did change the world for the better. But it has always prioritised experience over technology. That’s not even really a bad thing! Just as there are tons of boring companies solving expensive problems with Perl or with CSVs being FTPed around, there is a lot of value in delivering boring or even bad tech in a good package.

It’s just sometimes it gets sad thinking how much better things could be.

steve1977
15 replies
2h44m

it really did change the world for the better.

I don’t know about that (hyperbole aside). I’ve been in IT for more than 25 years now. I can’t see that Docker container actually delivered any tangible benefits in terms of end-product reliability or velocity of development to be honest. This might not necessarily be Dockers fault though, maybe it’s just that all the potential benefits get eaten up by things like web development frameworks and Kubernetes.

But at the end of the day, todays Docker-based web app development delivers less than fat-client desktop app development delivered 20 years ago, as sad as that is.

9dev
9 replies
1h41m

If you haven’t seen the benefits, you’re not in the business of deploying a variety of applications to servers.

The fact that I don’t have to install dependencies on a server, or set up third-party applications like PHP, Apache, Redis, and the myriad of other packages anymore, or manage config files in /etc, or handle upgrades of libc gracefully, or worry about rolling restarts and maintenance downtime… all of this was solvable before, but has become radically easier with containers.

Packaging an application and its dependencies into a single, distributable artifact that can be passed around and used on all kinds of machines was a glorious success.

steve1977
7 replies
1h36m

I’m aware of all of that, I’m just saying that this has not translated into more reliable and better software in the end, interestingly enough. As said, I’m not blaming Docker, at least not directly. It’s more that the whole “ecosystem” around it seems to have so many disadvantages that in the end overweigh the advantages of Docker.

derefr
4 replies
1h32m

It has translated to reliable legacy software. You can snapshot a piece of software, together with its runtime environment, at the point when it's still possible to build it; and then you can continue to run that built OCI image, with low overhead, on modern hardware — even when building the image from scratch has long become impossible due to e.g. all the package archives that the image fetched from going offline.

(And this enables some increasingly wondrous acts of software archaeology, due to people building OCI images not for preservation, but just for "use at the time" — and then just never purging them from whatever repository they've pushed them to. People are preserving historical software builds in a runnable state, completely by accident!)

Before Docker, the nearest thing you could do to this was to package software as a VM image — and there was no standard for what "a VM image" was, so this wasn't a particularly portable/long-term solution. Often VM-image formats became unsupported faster than the software held in them did!

But now, with OCI images, we're nearly to the point where we've e.g. convinced academic science to publish a paper's computational apparatus as an OCI image, so that it can be pulled 10 years later when attempting to replicate the paper.

steve1977
3 replies
1h7m

You can snapshot a piece of software, together with its runtime environment, at the point when it's still possible to build it

I think you’re onto part of the problem here. The thing is that you have to snapshot a lot of nowadays software together with its runtime environment.

I mean, I can still run Windows software (for example) that is 10 years or older without that requirement.

9dev
1 replies
1h5m

The price for that kind of backwards compatibility is a literal army of engineers working for a global megacorporation. Free software could not manage that, so having a pragmatic way to keep software running in isolated containers seems like a great solution to me.

steve1977
0 replies
54m

There’s an army of developers working on Linux as well, employed by companies like IBM and Oracle. I don’t see a huge difference to Microsoft here to be honest.

ahnick
0 replies
7m

What are you even talking about? Being able to run 10 year old software (on any OS) is orthogonal to being able to build a piece software whose dependencies are completely missing. Don't pretend like this doesn't happen on Windows.

9dev
1 replies
59m

Im not sure I would agree here: from my personal experience, the increasing containerisation has definitely nudged lots of large software projects to behave better; they don’t spew so many artifacts all over the filesystem anymore, for example, and increasingly adopt environment variables for configuration.

Additionally, I think lots of projects became able to adopt better tooling faster, since the barrier to use container-based tools is lower. Just think of GitHub Actions, which suddenly enabled everyone and their mother to adopt CI pipelines. That simply wasn’t possible before, and has led to more software adopting static analysis and automated testing, I think.

steve1977
0 replies
51m

This might all be true, but has this actually resulted in better software for end users? More stability, faster delivery of useful features? That is my concern.

PaulHoule
0 replies
14m

Circa 2005 I was working at places where I was responsible for 80 and 300 web sites respectively using a large range of technologies. On my own account I had about 30 domain names.

I had scripts that would automatically generate the Apache configuration to deploy a new site in less than 30 seconds.

At that time I found that most web sites have just a few things to configure: often a database connection, the path to where files are, and maybe a cryptographic secret. If you are systematic about where you put your files and how you do your configuration running servers with a lot of sites is about as easy as falling off a log, not to mention running development, test, staging, prod and any other sites you need.

I have a Python system now with gunicorn servers and celery workers that exists in three instances on my PC, because I am disciplined and everything is documented I could bring it up on another machine manually pretty quickly, probably more quickly than I could download 3GB worth of docker images over my ADSL connection. With a script it would be no contest.

There also was a time I was building AMIs and even selling them on the AMZN marketplace and the formula was write a Java program that writes a shell script that an EC2 instance runs on boot, when it is done it sends a message through SQS to tell the Java program to shut down and image the new machine.

If Docker is anything it is a system that turns 1 MB worth of I/O into 1 GB of I/O. I found Docker was slowing me down when I was using a gigabit connection, I found it basically impossible to do anything with it (like boot up an image) on a 2MB/sec ADSL connection, with my current pair of 20MB/s connections it is still horrifyingly slow.

I like how the OP is concerned about I/O speed and bringing it up and I think it could be improved if there was a better cache system (e.g. Docker might even work on slow ADSL if it properly recovered from failed downloads)

However I think Docker has a conflict between “dev” (where I’d say your build is slow if you ever perceive yourself to be waiting) and “ops” (where a 20 minute build is “internet time”)

I think ops is often happy with Docker, some devs really seem to like it, but for some of us it is a way to make a 20 sec task a 20 minute task.

supriyo-biswas
1 replies
1h47m

Being able to create a portable artifact with only the userspace components in it, and that can be shipped and run anywhere with minimal fuss is something that didn't really exist before containers.

docandrew
0 replies
1h39m

Java?

bandrami
1 replies
1h16m

25 years ago I could tell you what version of every CPAN library was in use at my company (because I installed them). What version of what libraries are the devs I support using now? I couldn't begin to tell you. This makes devs happy but I think has harmed the industry in aggregate.

twelfthnight
0 replies
21m

Because of containers, my company now can roll out deployments using well defined CI/CD scripts, where we can control installations to force usage of pass-through caches (GCP artifact registry). So it actually has that data you're talking about, but instead of living in one person's head it's stored in a database and accessable to everyone via an API.

pyrale
0 replies
1h59m

But at the end of the day, todays Docker-based web app development delivers less than fat-client desktop app development delivered 20 years ago, as sad as that is.

You mean, aside from not having to handle installation of your software on your users' machines?

Also I'm not sure this is related to docker at all.

belter
3 replies
7h34m

Was it really so amazing? Here is half a Docker implementation, in about 100 lines of Bash...

https://github.com/p8952/bocker

samlinnfer
0 replies
7h13m

The other half is the other 90%.

Looking at it now, it won't even run in the latest systemd, which now refuses to boot with cgroups v1. Good luck even accessing /dev/null under cgroups v2 with systemd.

redserk
0 replies
5h57m

Lines of code is irrelevant.

Docker is important because:

1) it made a convenient process to build a “system” image of sorts, upload it, download it, and run it.

2) (the important bit!) Enough people adopted this process for it to become basically a standard

Before Docker, it wasnt uncommon to ship some complicated apps in VMs. Packaging those was downright awful with all of the bespoke scripting needed for the various steps of distribution. And then you get a new job? Time to learn a brand new process.

greiskul
0 replies
1h47m

And like the famous hacker news comment goes, Dropbox is trivial by just using FTP, curlftpfs and SVN. Docker might have many faults, but for anybody that dealt with the problems that it aimed to solve do know in that it was revolutionary in simplifying things.

And for people that disagree, please write a library like TestContainers using cobbled together bash scripts, that can download and cleanly execute and then clean up almost any common use backend dependency.

derefr
0 replies
1h33m

Yeah, but the Open Container Initiative is supposed to be the responsible adults in the room taking the "fail fast" corporate Docker Inc stuff, and taking time to apply good engineering principles to it.

It's somewhat surprising that the results of that process are looking to be nearly as fly-by-the-seat-of-your-pants as Docker itself is.

KronisLV
0 replies
5h30m

As a user experience Docker is amazing, but as technology it is hot garbage.

I mean, Podman exists, as do lots of custom build tools and other useful options. Personally, I mostly just stick with vanilla Docker (and Compose/Swarm), because it's pretty coherent and everything just fits together, even if it isn't always perfect.

Either way, agreed about the concepts behind the technology making things better for a lot of folks out there, myself included (haven't had prod issues with mismatched packages or inconsistent environments in years at this point, most of my personal stuff also runs on containers).

mschuster91
4 replies
9h0m

On top of that, it's either the OCI spec that's broken or it's just AWS being nuts, but unlike GitLab and Nexus, AWS ECR doesn't support automatically creating folders (e.g. "<acctid>.dkr.ecr.<region>.amazonaws.com/foo/bar/baz:tag"), it can only do flat storage and either have seriously long image names or tags.

Yes you can theoretically create a repository object in ECR in Terraform to mimic that behavior, but it sucks in pipelines where the result image path is dynamic - you need to give more privileges to the IAM role of the CI pipeline than I'm comfortable with, not to mention that I don't like any AWS resources managed outside of the central Terraform repository.

[1] https://stackoverflow.com/questions/64232268/storing-images-...

hanikesn
2 replies
3h47m

That's seem standard AWS practice. Implement a new feature so you can check the box, but in practice it's a huge pain to actually use.

X-Istence
0 replies
53m

Azure's IPv6 implementation is still flawed and still broken. That has not changed.

xyzzy_plugh
0 replies
12m

IIRC it's not in the spec because administration of resources is out of scope. For example, perhaps you offer a public repository and you want folks to sign up for an account before they can push? Or you want to have an approval process before new repositories are created?

Regardless it's a huge pain that ECR doesn't support this. Everybody I know of who has used ECR has run into this.

There's a long standing issue open which I've been subscribed to for years now: https://github.com/aws/containers-roadmap/issues/853

wofo
17 replies
10h42m

Hi HN, author here. If anyone knows why layer pushes need to be sequential in the OCI specification, please tell! Is it merely a historical accident, or is there some hidden rationale behind it?

Edit: to clarify, I'm talking about sequentially pushing a _single_ layer's contents. You can, of course, push multiple layers in parallel.

IanCal
12 replies
9h50m

I can't think of an obvious one, maybe load based?

~~I added parallel pushes to docker I think, unless I'm mixing up pulls & pushes, it was a while ago.~~ My stuff was around parallelising the checks not the final pushes.

Edit - does a layer say which layer it goes "on top" of? If so perhaps that's the reason, so the IDs of what's being pointed to exist.

wofo
11 replies
9h44m

Layers are fully independent of each other in the OCI spec (which makes them reusable). They are wired together through a separate manifest file that lists the layers of a specific image.

It's a mystery... Here are the bits of the OCI spec about multipart pushes (https://github.com/opencontainers/distribution-spec/blob/58d...). In short, you can only upload the next chunk after the previous one finishes, because you need to use information from the response's headers.

IanCal
10 replies
9h41m

Ah thanks.

That's chunks of a single layer though, not multiple layers right?

wofo
9 replies
9h39m

Indeed, you are free to push multiple layers in parallel. But when you have a 1 GiB layer full of AI/ML stuff you can feel the pain!

(I just updated my original comment to make clear I'm talking about single-layer pushes here)

killingtime74
8 replies
8h38m

Split the layer up?

thangngoc89
7 replies
7h40m

You can’t. Installing pytorch and supporting dependencies takes 2.2GB on debian-slim.

password4321
4 replies
7h18m

It should be possible to split into multiple layers as long as each file is wholly within in its layer. This is the exact opposite of the work recommended combining commands to keep everything in one layer which I think is done ultimately for runtime performance reasons.

ramses0
3 replies
4h26m

I've dug fairly deep into docker layering, it would be wonderful if there was a sort of `LAYER ...` barrier instead of implicitly via `RUN ...` lines.

Theoretically there's nothing stopping you from building the docker image and "re-layering it", as they're "just" bundles of tar files at the end of the day.

eg: `RUN ... ; LAYER /usr ; LAYER /var ; LAYER /etc ; LAYER [discard|remainder]`

yjftsjthsd-h
2 replies
2h55m

I've wished for a long time that Dockerfiles had an explicit way to define layers ripped off from (postgre)sql:

    BEGIN
    RUN foo
    RUN bar
    COMMIT

mdaniel
0 replies
2h20m

At the very real risk of talking out of my ass, the new versioned Dockerfile mechanism on top of builtkit should enable you to do that: https://github.com/moby/buildkit/blob/v0.15.0/frontend/docke...

In true "when all you have is a hammer" fashion, as very best I can tell that syntax= directive is pointing to a separate docker image whose job it is to read the file and translate it into builtkit api calls, e.g. https://github.com/moby/buildkit/blob/v0.15.0/frontend/docke...

But, again for clarity: I've never tried such a stunt, that's just the impression I get from having done mortal kombat with builtkit's other silly parts

fweimer
0 replies
7h22m

Surely you can have one layer per directory or something like that? Splitting along those lines works as long as everything isn't in one big file.

I think it was a mistake to make layers as a storage model visible in to the end user. This should just have been an internal implementation detail, perhaps similar to how Git handles delta compression and makes it independent of branching structure. We also should have delta pushes and pulls, using global caches (for public content), and the ability to start containers while their image is still in transfer.

electroly
0 replies
3h24m

If you've got plenty of time for the build, you can. Make a two-stage build where the first stage installs Python and pytorch, and the second stage does ten COPYs which each grab 1/10th of the files from the first stage. Now you've got ten evenly sized layers. I've done this for very large images (lots of Python/R/ML crap) and it takes significant extra time during the build but speeds up pulls because layers can be pulled in parallel.

abofh
1 replies
1h49m

It makes clean-up simpler - if you never got to the "last" one, it's obvious you didn't finish after N+Timeout and thus you can expunge it. It simplifies an implementation detail (how do you deal with partial uploads? make them easy to spot). Otherwise you basically have to trigger at the end of every chunk, see if all the other chunks are there and then do the 'completion'.

But that's an implementation detail, and I suspect isn't one that's meaningful or intentional. Your S3 approach should work fine btw, I've done it before in a prior life when I was at a company shipping huge images and $.10/gb/month _really_ added up.

You lose the 'bells and whistles' of ECR, but those are pretty limited (imho)

orf
0 replies
4m

In the case of a docker registry, isn’t the “final bit” just uploading the final manifest that actually references the layers you’re uploading?

At this point you’d validate that the layers exist and have been uploaded, otherwise you’d just bail out?

And those missing chunks would be handled by the normal registry GC, which evicts unreferenced layers?

rcarmo
0 replies
10h30m

Never dealt with pushes, but it’s nice to see this — back when Docker was getting started I dumped an image behind nginx and pulled from that because there was no usable private registry container, so I enjoyed reading your article.

codethief
0 replies
1h54m

Hi, thanks for the blog post!

For the last four months I’ve been developing a custom container image builder, collaborating with Outerbounds

I know you said this was something for another blog post but could you already provide some details? Maybe a link to a GitHub repo?

Background: I'm looking for (or might implement myself) a way to programmatically build OCI images from within $PROGRAMMING_LANGUAGE. Think Buildah, but as an API for an actual programming language instead of a command line interface. I could of course just invoke Buildah as a subprocess but that seems a bit unwieldy (and I would have to worry about interacting with & cleaning up Buildah's internal state), plus Buildah currently doesn't support Mac.

donatj
10 replies
7h14m

I don't do a ton with Docker outside dev tooling, but I have never understood why private container registries even exist? It just smells like rent seeking. What real advantage does it provide over say just generating some sort of image file you manage yourself, as you please?

danmur
2 replies
6h47m

Companies send young engineers (and older engineers who should know more but don't) to AWS and Microsoft for "cloud certification". They learn how to operate cloud services because thats what benefits AWS and MS, so thats what their solutions use.

It's a difficult uphill battle to get people interested in how things work under the hood, which is what you need in order to know you can do things like easily host your own package repositories.

figmert
1 replies
6h19m

This is a odd assessment. I agree certifications aren't all that, but having people learn them isn't about that. It's more that people don't feel like reinventing the wheel at every company, so they can focus on the real work, like shipping the application they've written. So companies like AWS, Docker etc, write things, abstract things away, so someone else doesn't have to redo the whole thing.

Yes I can host my packages and write tooling around it to make it easy. But JFrog already has all the tooling around it, and it integrates with current tooling. Why would I write the whole thing again?

danmur
0 replies
4h44m

I am responding to this part of the parent comment:

I don't do a ton with Docker outside dev tooling, but I have never understood why private container registries even exist?

You know the options and have made a conscious choice:

Yes I can host my packages and write tooling around it to make it easy. But JFrog already has all the tooling around it, and it integrates with current tooling. Why would I write the whole thing again?

So presumably you are not the kind of people I was talking about.

EDIT: I'm also assuming by the rent seeking part that the parent is referring to paid hosted services like ECR etc.

JackSlateur
1 replies
6h54m

You will probably have images that you will not share to the world. Said images will probably be made available to your infrastructure (k8s clusters, CI/CD runners etc). So you have to either build your own registry or pay someone to do it for you.

Of course, if you use images for dev only, all of that are worthless and you just store your images on your dev machine

regularfry
0 replies
2h13m

Also if your infrastructure is within AWS, you want your images to also be within AWS when the infrastructure wants them. That doesn't necessarily imply a private registry, but it's a lot less work that way.

vel0city
0 replies
1h26m

Why have a code repository instead of just emailing files around?

Because you want a central store someplace with all the previous versions that is easily accessible to lots of consumers.

I don't want to build my app and then have to push it to every single place that might run it. Instead, I'll build it and push it to a central repo and have everything reference that repo.

It just smells like rent seeking.

You don't need to pay someone to host a private repo for you. There are lots of tools out there so you can self-host.

mcraiha
0 replies
6h35m

Private (cloud) registries are very useful when there are mandatory AuthN/AuthZ things in the project related to the docker images. You can terraform/bicep/pulumi everything per environment.

figmert
0 replies
6h54m

You don't have to use it. You can use docker save and docker import:

    docker save alpine:3.19 > alpine.tar
    docker load < alpine.tar
But now I have to manage that tar file, have all my systems be aware of where it is, how to access it, etc. Or, I could just not re-invent the wheel and use what docker already has provided.

arccy
0 replies
4h34m

and how do you manage them? you use the same tooling that exists for all public images by running a container registry.

alemanek
0 replies
1h58m

Integration with vulnerability scanning utilities and centralized permissions for orgs are nice benefits.

jaimehrubiks
8 replies
10h11m

I experience everyday the slowness of pushing big images (ai related tend to be big) to ECR on our cicd.

wofo
7 replies
10h8m

I wonder whether the folks at Cloudflare could take the ideas from the blog post and create a high-performance serverless container registry based on R2. They could call it scrubs, for "serverless container registry using blob storage" :P

slig
1 replies
7h50m

Hopefully someone gets inspired by this and implements a thin wrapper using CF workers.

wofo
0 replies
7h19m

I didn't expect that! It's a pity they don't expose an API for parallel uploads, for those of us who need to maximize throughput and don't mind using something non-standard.

wofo
0 replies
5h38m

Ah, that's great! I'll have to look into it :)

sshine
0 replies
9h37m

https://www.youtube.com/watch?v=3QbOssRq0Gs

  [Chorus: Chilli & T-Boz]
  No, I don't want no scrub
  A scrub is a guy that can't get no love from me
  Hangin' out the passenger side of his best friend's ride
  Trying to holla at me
  I don't want no scrub
  A scrub is a guy that can't get no love from me
  Hangin' out the passenger side of his best friend's ride
  Trying to holla at me

cpa
7 replies
10h25m

Is there a good reason for not allowing parallel uploads in the spec?

benterix
5 replies
10h19m

I believe that even if there was one then, it's probably no longer valid and it's now just a performance limitation.

wofo
4 replies
10h9m

Other than backwards-compatibility, I can imagine simplicity being a reason. For instance, sequential pushing makes it easier to calculate the sha256 hash of the layer as it's being uploaded, without having to do it after-the-fact when the uploaded chunks are assembled.

jtmarmon
1 replies
10h0m

I’m no expert on docker but I thought the hashes for each layer would already be computed if your image is built

wofo
0 replies
9h58m

That's true, but I'd assume the server would like to double-check that the hashes are valid (for robustness / consistency)... That's something my little experiment doesn't do, obviously.

catlifeonmars
0 replies
10h4m

That does not make any sense; as the network usually is a much bigger bottleneck than compute, even with disk reads. You’re paying quite a lot for “simplicity” if that were the case

amluto
0 replies
8h46m

The fact that layers are hashed with SHA256 is IMO a mistake. Layers are large, and using SHA256 means that you can’t incrementally verify the layer as you download it, which means that extreme care would be needed to start unpacking a layer while downloading it. And SHA256 is fast but not that fast, whereas if you really feel like downloading in parallel, a hash tree can be verified in parallel.

A hash tree would have been nicer, and parallel uploads would have been an extra bonus.

wofo
0 replies
10h20m

No idea... I asked the same question here (https://news.ycombinator.com/item?id=40943480) and am hoping we'll have a classic HN moment where someone who was involved in the design of the spec will chime in.

filleokus
3 replies
6h52m

I've started to grow annoyed with container registry cloud products. Always surprisingly cumbersome to auto-delete old tags, deal with ACL or limit the networking.

It would be nice if a Kubernetes distro took a page out of the "serverless" playbook and just embedded a registry. Or maybe I should just use GHCR

vbezhenar
0 replies
5h12m

Kubernetes is extremely bare-bones, there's no way they'll embed a registry. Kubernetes doesn't touch images at all, AFAIK, it delegates that to the container runtime, e.g. containerd.

If you want some lightweight registry, use "official" docker registry. I'm running it inside Kubernetes and it consumes it just fine.

mdaniel
0 replies
2h12m

Always surprisingly cumbersome to auto-delete old tags,

Does this not do what you want? https://docs.aws.amazon.com/AmazonECR/latest/userguide/lifec...

I can't speak to the other "registry cloud products" except for GitLab, which is its own special UX nonsense, but they also support expiry after enough whisky consumption

breatheoften
0 replies
4h4m

I'm using google's artifact registry -- aside from upload speed another thing that kills me is freakin download speed ... Why in the world should it take 2 minutes to download a 2.6 GB layer to a cloud build instance sitting in the same region as the artifact registry ... Stupidly slow networking really harms the stateless ci machine + docker registry cache which actually would be quite cool if it was fast enough ...

In my case it's still faster than doing the builds would be -- but I'm definitely gonna have to get machines with persistent local cache in the mix at some point so that these operations will finish within a few seconds instead of a few minutes ...

_flux
1 replies
8h6m

I hadn't seen that before, and it indeed does support S3, but does it also offer the clients the downloads directly from S3, or does it merely use it as its own storage backend (so basically work as a proxy when pulling)?

vbezhenar
0 replies
5h14m

It redirects client requests to S3 endpoint, so yes, in the end all heavy traffic goes from S3.

ericpauley
2 replies
8h5m

Where's the source code?

wofo
0 replies
7h25m

The source code is proprietary, but it shouldn't take much work to replicate, fortunately (you just need to upload files at the right paths).

seungwoolee518
0 replies
5h12m

Like treat path as a object key, and put value as a json or blob?

ericfrederich
2 replies
5h55m

R2 in only "free" until it isn't. Cloudflare hasn't got a lot of good press recently. Not something I'd wanna build my business around.

jgrahamc
0 replies
5h23m

R2 egress is free.

TheMrZZ
0 replies
4h52m

Aside from the casino story (high value target that likely faces tons of attacks, therefore an expensive customer for CF), did something happen with them? I'm not aware of bad press around them in general

victorbjorklund
1 replies
8h34m

But this only works for public repos right? I assume docker pull wont use a s3 api key

wofo
0 replies
7h21m

That's true, unfortunately. I'm thinking about ways to somehow support private repos without introducing a proxy in between... Not sure if it will be possible.

michaelmior
1 replies
4h49m

Why can’t ECR support this kind of parallel uploads? The “problem” is that it implements the OCI Distribution Spec…

I don't see any reason why ECR couldn't support parallel uploads as an optimization. Provide an alternative to `docker push` for those who care about speed that doesn't conform to the spec.

wofo
0 replies
4h42m

Indeed, they could support it through a non-standard API... I wish they did!

lofties
1 replies
8h2m

This sounds very, very expensive, and I would've loved to see cost mentioned in the article too. (for both S3 and R2)

remram
0 replies
4h38m

The cost is the S3 cost though. It depends on region and storage tier, but the storage cost per GB, the GET/PUT cost, and the bandwidth cost can be found on the AWS website: https://aws.amazon.com/s3/pricing/

justin_oaks
0 replies
3h26m

Looks cool. Thanks for linking it.

It does mention that it's limited to 500MB per layer.

For some people's use case that limitation might not be a big deal, but for others that's a dealbreaker.

8organicbits
1 replies
7h17m

that S3 is up to 8x faster than ECR

Awesome. Developer experience is so much better when CI doesn't take ages. Every little bit counts.

barbazoo
0 replies
2h20m

ECR 24 MiB/s (8.2 s)

S3 115 MiB/s (1.7 s)

It's great that it's faster but absolutely, it's only an improvement of 6.5s observed, as you said, on the CI server. And it means using something for a purpose that it's not intended for. I'd hate to have to spend time debugging this if it breaks for whatever reason.

tealpod
0 replies
6h55m

This is such a wonderful idea, congrats.

There is a real usecase for this in some high security sectors. I can't put complete info here for the security reasons, let me know if you are interested.

phillebaba
0 replies
7h24m

Interesting idea to use the file path layout as a way to control the endpoints.

I do wonder though how you would deal with the Docker-Content-Digest header. While not required it is suggested that responses should include it as many clients expect it and will reject layers without the header.

Another thing to consider is that you will miss out on some feature from the OCI 1.1 spec like the referrers API as that would be a bit tricky to implement.

lazy_moderator1
0 replies
4h25m

That's neat! On that note I've been using S3 as a private registry for years now via Gitlab and couldn't be happier!

kevin_nisbet
0 replies
4m

It's cool to see it, I was interested in trying something similar a couple years ago but priorities changed.

My interest was mainly around a hardening stand point. The base idea was the release system through IAM permissions would be the only system with any write access to the underlying S3 bucket. All the public / internet facing components could then be limited to read only access as part of the hardening.

This would of course be in addition to signing the images, but I don't think many of the customers at the time knew anything about or configured any of the signature verification mechanisms.

KronisLV
0 replies
5h33m

That's a pretty cool use case!

Personally, I just use Nexus because it works well enough (and supports everything from OCI images to apt packages and stuff like a custom Maven, NuGet, npm repo etc.), however the configuration and resource usage both are a bit annoying, especially when it comes to cleanup policies: https://www.sonatype.com/products/sonatype-nexus-repository

That said:

More specifically, I logged the requests issued by docker pull and saw that they are “just” a bunch of HEAD and GET requests.

this is immensely nice and I wish more tech out there made common sense decisions like this, just using what has worked for a long time and not overcomplicating.

I am a bit surprised that there aren't more simple container repositories out there (especially with auth and cleanup support), since Nexus and Harbor are both a bit complex in practice.