return to table of content

Dive: A tool for exploring a Docker image, layer contents and more

miquong
16 replies
1d

For image and layer manipulation, crane is awesome - as is the underlying go-containerregistry library.

It lets you add new layers, or edit any metadata (env vars, labels, entrypoint, etc) in existing images. You can also "flatten" an image with multiple layers into a single layer. Additionally you can "rebase" an image (re-apply your changes onto a new/updated base image). It does all this directly in the registry, so no docker needed (though it's still useful for creating the original image).

https://github.com/google/go-containerregistry/blob/main/cmd...

(updated: better link)

pbowyer
14 replies
22h43m

Is there any performance benefit to having fewer layers? My understanding is that there's no gain by merging layers as the size of the image remains constant.

yrro
4 replies
17h26m

If you've got a 50 layer image then each time you open a file, I believe the kernel has to look for that file in all 50 layers before it can fail with ENOENT.

naikrovek
1 replies
13h57m

That seems ripe for optimization, if true, especially since those layers are all immutable.

yrro
0 replies
7h43m

A container runtime could optimize for speed by unpacking all those layers one by one into a single lower directory for the container to use; but at the cost of using lots of disk space, since those layers would no longer be shared between different containers.

cmckn
1 replies
12h56m

It depends on your OCI engine; but this isn’t the case with containers. Each layer is successively “unpacked” upon a “snapshot”, from which containers are created.

yrro
0 replies
7h44m

Interesting. Certainly with podman I observe that the layers are combined by overlayfs.

I build this Dockerfile into a container with 'podman build -t test .':

    FROM registry.access.redhat.com/ubi9/ubi-minimal
    RUN touch /file1
    RUN touch /file2
    RUN touch /file3
Then run it with 'podman run --rm -it test':

Then I can inspect the container's storage configuration:

    $ podman inspect -l | jq '.[].GraphDriver'
    {
      "Name": "overlay",
      "Data": {
        "LowerDir": "/home/sam/.local/share/containers/storage/overlay/e0055211bf4b6a839f91759d9bbbb5b93c61f4e2969b25ec9d732428557c230f/diff:/home/sam/.local/share/containers/storage/overlay/efd666ad5dc6b7a92933043ed42a41341b4e25c80e981f6d01ab41927d8f8a89/diff:/home/sam/.local/share/containers/storage/overlay/1c108fb13c6708a93dcf237bea2637344cf65d3c9272693817aa0741b158fd7b/diff:/home/sam/.local/share/containers/storage/overlay/80c0d7946d02825018d47fbf34df054bec5dc433ae20f71ffabd8d3725448837/diff",
        "MergedDir": "/home/sam/.local/share/containers/storage/overlay/f14a1506ed692e3c40b06d5eb1ba9c011bad159c82df6ddee632b49ba02a5bfc/merged",
        "UpperDir": "/home/sam/.local/share/containers/storage/overlay/f14a1506ed692e3c40b06d5eb1ba9c011bad159c82df6ddee632b49ba02a5bfc/diff",
        "WorkDir": "/home/sam/.local/share/containers/storage/overlay/f14a1506ed692e3c40b06d5eb1ba9c011bad159c82df6ddee632b49ba02a5bfc/work"
      }
    }
Note that there are four directories in the LowerDir, and examining their contents reveals that there's one per layer in my 'test' image:

    $ podman inspect -l | jq '.[].GraphDriver.Data.LowerDir' -r | tr : '\n' | xargs -rt -n1 ls
    ls /home/sam/.local/share/containers/storage/overlay/e0055211bf4b6a839f91759d9bbbb5b93c61f4e2969b25ec9d732428557c230f/diff
    file3  run/
    ls /home/sam/.local/share/containers/storage/overlay/efd666ad5dc6b7a92933043ed42a41341b4e25c80e981f6d01ab41927d8f8a89/diff
    file2  run/
    ls /home/sam/.local/share/containers/storage/overlay/1c108fb13c6708a93dcf237bea2637344cf65d3c9272693817aa0741b158fd7b/diff
    file1  run/
    ls /home/sam/.local/share/containers/storage/overlay/80c0d7946d02825018d47fbf34df054bec5dc433ae20f71ffabd8d3725448837/diff
    afs/  bin@  boot/ dev/  etc/  home/ lib@  lib64@  lost+found/ media/  mnt/  opt/  proc/  root/  run/  sbin/ srv/  sys/  tmp/  usr/  var/
Same for a system (non-rootless) container:

    # podman inspect systemd-oxidized | jq '.[].GraphDriver.Data.LowerDir' -r | tr ':' '\n' | xargs -rt -n 1 ls -F
    ls -F /var/lib/containers/storage/overlay/062693a4d8fe24055645ee1c11bbc542507ba3fe101ed4e5b76b6457752d3ac5/diff 
    home/
    ls -F /var/lib/containers/storage/overlay/9d9c6a906aff40160f924c4fec96692a6fc59f27316b8df5ee2616091008a2fe/diff 
    home/
    ls -F /var/lib/containers/storage/overlay/e35e4cf6ce1057979baee991e609f82afb85360560dc8bf36ab069d8e8288f5f/diff 
    home/
    ls -F /var/lib/containers/storage/overlay/da5189fd0e740204d7a30e4b59b012251e8d36a1db488bf47ede61247a948fff/diff 
    home/
    ls -F /var/lib/containers/storage/overlay/702d48c0e8afce7cf5ce049939d8fdb596755e709a3735bbbf4900ec65c45529/diff 
    etc/  home/  var/
    ls -F /var/lib/containers/storage/overlay/877031909d6cf6038e0e0295c34f14932b71903835b3eb13ae6bf23396d65daf/diff 
    etc/  usr/  var/
    ls -F /var/lib/containers/storage/overlay/b90fc2ebe1d7c19e5ee43033b8e8be95210b4e3a0012bb54db327f6d52e89d12/diff 
    tmp/
    ls -F /var/lib/containers/storage/overlay/aa2e131948d26d04f48f7c01c57aa98e2a308e4fc0d707c9532d30fa4ab89f5d/diff 
    root/  usr/  var/
    ls -F /var/lib/containers/storage/overlay/6af6a203c83f697aa35b5007b5678b8c2b052ab482d9d7f7e5b322cd74682441/diff 
    root/  tmp/  usr/  var/
    ls -F /var/lib/containers/storage/overlay/b237f20a22db5d282838c5a7c9d78f7d6e648d9b223a328fa8b8078e9c6b5bbe/diff 
    ls -F /var/lib/containers/storage/overlay/7d1bc5d46561795a4f7058e4098faa345e119617292761933e2ad7f185dbd45b/diff 
    ls -F /var/lib/containers/storage/overlay/88d52756a706920fbcb38cbbb6eaf43ea09be0045361ffe6920185ca7179a1d9/diff 
    tmp/
    ls -F /var/lib/containers/storage/overlay/0d0dedc3e4f8d7f39b9a0c78d1cf380ad1161023a49bd74ab8b51626f6d21f19/diff 
    root/  usr/  var/
    ls -F /var/lib/containers/storage/overlay/d74968407a57d753c2ae232ea26e0e2d7fed30683a67a993b814b71684817eaa/diff 
    root/  usr/  var/
    ls -F /var/lib/containers/storage/overlay/885b215dc2a8304a58a6d35a5668c74d3d80f27dacf56b97a3b8b13c1a423b33/diff 
    root/  usr/  var/
    ls -F /var/lib/containers/storage/overlay/a4edd3819e3655d0ca90ac898a4a2b91ec50c83cbf61862186e10ae1b44cbcc0/diff 
    etc/  usr/  var/
    ls -F /var/lib/containers/storage/overlay/954dcca00292635597e4c5ee9ebc35bf5fcc15a50dfd8f176df7b4c15e695e10/diff 
    bd_build/  etc/  root/ run/  usr/  var/
    ls -F /var/lib/containers/storage/overlay/fb670aa62e0f4f71382ef61a742e8ab1ad5d345b95267b0adc18c10e6c87f13b/diff 
    bd_build/
    ls -F /var/lib/containers/storage/overlay/7f5cbd8cc787c8d628630756bcc7240e6c96b876c2882e6fc980a8b60cdfa274/diff 
    bin@  boot/  dev/  etc/  home/ lib@  lib32@  lib64@  libx32@  media/  mnt/  opt/  proc/  root/  run/  sbin@  srv/  sys/  tmp/ usr/  var/
('podman system info' confirms that .store.graphDriverName == "overlay")

I'd be interested to see what this looks like with docker?

natebc
2 replies
22h38m

some startup performance savings in fewer http requests to fetch the image. small for sure but it's something?

whirlwin
0 replies
20h6m

Depends. If you would have to fetch a big layer often because of updates, that's not good. But if what is changing frequently is in a smaller layer, it will be more favorable

electroly
0 replies
22h28m

In practice I've found the performance savings often goes the other way--for large (multi-GB) images it's faster to split it up into more layers that it can download in parallel from the registry. It won't parallelize the download of a single layer and in EC2+ECR you won't get particularly good throughput with a single layer.

miquong
1 replies
21h47m

Eventually, once zstd support gets fully supported, and tiny gzip compression windows are not a limitation, then compressing a full layer would almost certainly have a better ratio over several smaller layers

https://github.com/opencontainers/image-spec/issues/803

pbowyer
0 replies
9h52m

Is it coming? That ticket doesn't fill me with hope, given its age and the disagreements over backwards compatibility.

momothereal
0 replies
18h34m

I'm working on a tool that does the opposite: to split layers into smaller, deterministic deltas.

mcpherrinm
0 replies
22h20m

If files are overwritten or removed in a lower layer, there can be size savings from that.

fishpen0
0 replies
21h8m

Less performance and more security. Lots of ameteur images use a secret file or inadvertently store a secret to a layer without realizing an rm or other process in another layer doesn't actually eliminate it. If the final step of your build squashes the filesystem flat again you can remove a lot of potentially exposed metadata and secrets stored in intermediate layers

apt-get
0 replies
22h9m

There are some useful cases — for example, if you're taking a rather bloated image as a base and trimming it down with `rm` commands, those will be saved as differential layers, which will not reduce the size of the final image in the slightest. Only merging will actually "register" these deletions.

lyxell
0 replies
14h52m

This is a great recommendation. It is worth noting that unlike Docker, crane is root- and daemonless which makes it work great in Nix (it's called 'crane' in the Nix repository). This allows for Nix to be used to manage dependencies for both building (e.g. Go) as well as packaging and deploying (e.g. gnu tar, crane).

maxloh
13 replies
22h46m

A dumb question: Why are most of the container/infrastructure tools written in GoLang?

Examples that come to my mind include Docker, Podman, nerdctl, Terraform and Kubernetes.

Is there any obvious advantage that GoLang offers, making it so popular for building these tools?

fishpen0
4 replies
21h5m

Kubernetes specifically is in go because google invented go and also invented Kubernetes. Their internal teams have a lot of go engineers due to the whole inventing it thing

quickthrower2
1 replies
16h13m

I think k8s came from Borg though which is pre-Go

jacurtis
0 replies
13h45m

K8s was a generational iteration of Borg, but it was a full re-write with an emphasis on making it more universally usable and pluggable.

We've incorporated the best ideas from Borg in Kubernetes, and have tried to address some pain points that users identified with Borg over the years.

Borg was written in C++, but only contained container scheduling, resource allocation and some service discovery. Many other features of what is now Kubernetes were built later and essentially "shimmed" onto Borg.

Kubernetes was a re-write of Borg to rebuild many of its original features from the ground up using the lessons they had learned since originally building Borg. By this time, Go had been developed and was being actively used for many of these shims and supporting services surrounding Borg. Since the same team(s) were rebuilding Borg that had developed and maintained these other services, and because many of these shims and supporting services (which are already in Go) were being incorporated into Kubernetes, they decided to build the new version (which became Kubernetes) in Go.

Sources:

- https://kubernetes.io/blog/2015/04/borg-predecessor-to-kuber...

- https://storage.googleapis.com/gweb-research2023-media/pubto...

pjmlp
0 replies
10h27m

Kubernetes is in Go, because a couple of Go advocates joined the team and pushed for a Go rewrite.

It was originally written in Java.

In fact, most Go users are outside Google, internally it is all about Java, Kotlin, Dart, C++ and Python, and now Rust as well.

Kubernetes, gVisor and Android GPU debugger are probably the only major internal projects in Go.

dmlittle
0 replies
19h59m

I believe the original Kubernetes proof of concept was written in Java

wavemode
2 replies
18h14m

I would argue system tools and utilities, CLIs and networking software are where Go shines the most.

Rust is probably the only good modern alternative that is mature.

rochak
1 replies
11h0m

I have a question for big companies building software in Rust. How are they able to find the talent? Unlike most other common languages, it is magnitudes more difficult to ramp up entry level new hires in Rust for the simple reason that Rust requires more than normal understanding of Computer Science. Is it just the C++ devs that wanna transition or have already transitioned to Rust that those companies can recruit?

steveklabnik
0 replies
8m

How are they able to find the talent? Unlike most other common languages, it is magnitudes more difficult to ramp up entry level new hires in Rust

They have not found this to be true.

https://news.ycombinator.com/item?id=38869786

bombela
2 replies
21h51m

I think I can answer for Docker. The first prototype was written in Python, the company was a Python shop. The main reason for a rewrite in Go was to ride the popularity of Go that was growing at the time (2012).

source: I was there.

mardifoufs
1 replies
21h43m

In hindsight, docker is probably much better off with Go, considering the use case. And I say that as someone who loves python and isn't too much into go!

baby_souffle
0 replies
19h26m

In hindsight, docker is probably much better off with Go, considering the use case. And I say that as someone who loves python and isn't too much into go!

Same. I use docker to escape the versioning hell that is modern python.

When you're trying to build a tool, the more self-contained the better.

arccy
0 replies
21h51m

when you run containers, you want to care as little about the underlying system as possible, and go makes it easy to be in its own little world.

plus ecosystem effects of you can just use the packages of a different implementation for part of your code.

Yasuraka
0 replies
18h43m

Easiest language to (cross-)compile and distribute, stellar productivity to performance ratio, native (uncolored) concurrency, great networking capabilities in the stdlib. Imagine if you will Docker and Kubernetes written in any of the other popular languages.

tornadofart
5 replies
1d1h

What exactly is meant by a layer?

manojlds
2 replies
1d1h

Docker images have layers. Sort of like snapshots.

jake_morrison
0 replies
1d

Like ogres

gilnaa
0 replies
1d

Like an onion

iCarrot
1 replies
1d
tornadofart
0 replies
23h47m

Thanks.

runfaster2000
4 replies
1d1h

Dive is great. Tools like that are critical for both learning and developing confidence on what you are precisely building/shipping.

Dredge is another tool to look at. I use it for diffing layers.

https://github.com/mthalman/dredge/blob/main/docs/commands/i...

geek_at
3 replies
1d

It really does sound amazing. Would have needed this when you guys (hn) and reddit helped me figure out what a rogue Raspberry Pi was doing in our server closet

https://blog.haschek.at/2019/the-curious-case-of-the-RasPi-i...

WirelessGigabit
1 replies
22h25m

2019! Can you post an update?

8organicbits
0 replies
4h24m
thunderbong
0 replies
22h27m

That's an awesome article!

vbezhenar
1 replies
23h3m

What's the reason docker uses tar archives instead of ordinary directories for layer contents? This tool is great but it fixes something that should not exist in the first place.

cachvico
0 replies
22h55m

So images are serialized and able to be transmitted over a network.

When an image is used (or "run"), it becomes a container, which makes it behave (to the client) like ordinary files & directories.

sureglymop
1 replies
22h19m

There's a tool from google called container-diff that's also really useful!

I use it to see what random scripts one is encouraged to pipe into bash would do to a system.

roastedfunction
0 replies
13h13m

This is less related to general container utilities but I’m an avid user of GoogleContainerTools/container-structure-test. It’s a handy way to run integration tests on container apps or images.

These Google open source projects seem to be in need of some TLC as a lot of the original maintainers have moved on, which is a shame. I try to throw a PR their way and close out the odd issue when I can. The testing tool in particular is invaluable to keep my sanity with a large amount of base images I have to maintain internally.

diazc
1 replies
1d1h

There’s other great TUI terminal tools like dive here [0], lazydocker and dry come to mind.

And some in the docker category as well:

[0] https://terminaltrove.com/

pricci
0 replies
1d

Lazydocker has a similar, although simpler, funtionality.

Edit: just checked and it allows to see the layers, but only shows the commands of each one

tonymet
0 replies
14h52m

Dive is a gem. It's helped me find a lot of cruft ...

- unneeded build dependencies. Used a scratch image and/or removed build deps in the same step - node_modules for dev-deps . Used prod - Embeded Chromium builds (with puppetteer). Removed chromium and remoted an external build

Docker desktop now has this feature built in, but I've been using dive for years to find wasted space & potential security issues.

radus
0 replies
13h4m

Great tool, I use it with this alias:

  alias dive='docker run -ti --rm -v /var/run/docker.sock:/var/run/docker.sock wagoodman/dive'
(as suggested in project the README)

oooyay
0 replies
1d1h

Dive is incredible, it saved my butt numerous times and taught me a lot about layers. It's so good that Docker Desktop emulated its functionality.

notatoad
0 replies
1d1h

I found dive super useful for understanding how docker images work, and how to write efficient dockerfiles. Reading the docs is one thing, but making a change to the dockerfile and then seeing how it has affected the resulting layer structure is what really made me get it.

kylegalbraith
0 replies
4h18m

Dive is an amazing tool in the container/Docker space. It makes life so much easier to debug what is actually in your container. When we were first getting started with Depot [0], we often got asked how to reduce image size as well as make builds faster. So we wrote up a quick blog post that shows how to use Dive to help with that problem [1]. It might be a bit dated now, but in case it helps a future person.

Dive also inspired us to make it easier to surface what is actually in your build context, on every build. So we shipped that as a feature in Depot a few weeks back.

[0] https://depot.dev

[1] https://depot.dev/blog/reducing-image-size-with-dive

[2] https://depot.dev/blog/build-context

indrora
0 replies
17h29m

Dive has saved my ass so many times it's not funny when trying to pull apart what various common docker containers do when I'm extending them.

A+ software.

greenie_beans
0 replies
1d1h

this helped me debug a docker thing recently, very handy tool!

eris_agx
0 replies
18h26m

Other than being super useful, Dive has an underrated feature: its author is a great developer and very fun to work with.

animeshjain
0 replies
1d1h

I used dive when I was trying to cut down on the size of the image. Diffing and seeing what files/directories go into each layer was very useful.

a_t48
0 replies
1d1h

Dive is great. It struggles a bit with very very large images but beyond that no real complaints.

TechIsCool
0 replies
1d1h

I love dive and its something that I use in my tool kit multiple times a month.

I am curious if anyone knows how to get the contents of the file you have highlighted, a lot of the times I use dive to validate that a file exists in a layer and then I want to peak at it. Currently I normally revert to running the container and using cat or extracting the contents and then wandering into the folders.