Navigating OCI Artifacts and Images

OCI's image specification defines the contents and conventions of container images. This vendor-neutral standard became prevalent as the number of container runtimes increased¹, demanding compatibility beyond Docker. As container usage has grown, so has the need for assets that exist ancillary to containers. Examples include Open Policy Agent policies, Helm charts, and Carvel package bundles. These assets do not contain a filesystem, which is unaligned with the OCI image specification. As such, a need grew to define what is known as an OCI artifact. Artifacts offer a more generic definition² for what can be stored in an OCI registry and consumed by clients.

All this is to say that the number of OCI-compliant assets are growing and taking on new forms. In this new world, it's helpful to have tooling to introspect and work with remote OCI assets. Bonus points if this tooling is designed to work as a client without being a fully-baked container runtime. In this post, I'll be talking about crane, which is my Swiss Army knife for navigating OCI artifacts and images.

Tooling

While this post is about crane, there are several tools capable of interacting with OCI artifacts. Examples include:

skopeo: Solid features for airgapped (sync between registry) use cases and introspection. Built on libraries found in github.com/containers, which happen to be the libraries used for Podman. If you live in the Podman and/or RedHat ecosystem, this could be a good tool for you.
imgpkg: Has features around querying and introspecting OCI contents. This tooling shines in creating bundles of configuration that can be pushed to repositories. Combined with tools like kbld, it can build robust configurations enabling the locking of images referenced in configurations by their digest values³.

While the above (and some missing) tools are great, I grab for crane everytime I'm working with the discovery and introspection of OCI assets. I find it's UX to be solid, commands to be feature rich, it is compatible with other Unix tooling, and it's underlying library go-containerregistry is easy to work with at the Go/library level.

If you wish to follow along with this post, complete the Installation section of crane's GitHub page.

Discovery

First, you need a way to query the available tags on a given image. This can be done using the ls command. For example, you can determine which kube-apiserver images are available for the 1.24.x release.

$ crane ls k8s.gcr.io/kube-apiserver | grep -i 1.24

sha256-c5113882ff00af29730f560f6567de63644f10c0d51f2416c55b8a6649abe282.sig
v1.24.0
v1.24.0-alpha.0
v1.24.0-alpha.1
v1.24.0-alpha.2
v1.24.0-alpha.3
v1.24.0-alpha.4
v1.24.0-beta.0
v1.24.0-rc.0
v1.24.0-rc.1
v1.24.1
v1.24.1-rc.0
v1.24.2
v1.24.2-rc.0
v1.24.3-rc.0

Now, let's figure out what is the digest value for the v1.24.2 image.

$ crane digest k8s.gcr.io/kube-apiserver:v1.24.2

sha256:433696d8a90870c405fc2d42020aff0966fb3f1c59bdd1f5077f41335b327c9a

This digest is great, but doesn't tell the entire story. With the introduction of the OCI Image Index Specification, an asset may now contain a list that points to image manifests specific to the platform and architecture of the target system. This feature enables all platforms and architectures to point to the same tag, even though each needs to run its own unique image, as detailed below.

To understand which images are available, use the manifest command. This also identifies exactly where the image lives. Below are the images for Linux arm64 and amd64.

$ crane manifest k8s.gcr.io/kube-apiserver:v1.24.2 |\
    jq '.manifests[] | select(.platform.architecture=="amd64" or .platform.architecture=="arm64")'

{
  "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
  "size": 949,
  "digest": "sha256:e31b9dc1170027a5108e880ab1cdc32626fc7c9caf7676fd3af1ec31aad9d57e",
  "platform": {
    "architecture": "amd64",
    "os": "linux"
  }
}
{
  "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
  "size": 949,
  "digest": "sha256:a650cc38f804847dfa3a1043fa5c55d479be4e9c87be2ba1c3d58b803eec33e9",
  "platform": {
    "architecture": "arm64",
    "os": "linux"
  }
}

This image's manifest list implies there are multiple architectures available! This means if a container runtime pulls down this tag, it'll use the manifest in the list related to its architecture. For example, you'd expect containerd running on an ARM Linux host to pull down the container with the starting digest value of a650cc. It's also worth mentioning that if you ran manifest against a tag that does not have multiple architectures, you'd get a list of each of the container's layers along with each layer's digest value.

Introspection

At some point, you'll want to look into the actual contents of the asset. To start, you can look at the content of an image, namely kube-apiserver. The export command will allow downloading the tarball locally. Using the -v flag will give insight into how the image and its layers are being resolved.

$ crane export -v k8s.gcr.io/kube-apiserver:v1.24.2 - | tar xv

There's too much output to paste here, but looking at the logs from the above command, there are some key pieces of information. For example, note that export is being run against the tag v1.24.2, which doesn't point to an image, but instead a manifest pointing to platform/architecture specific images.

2022/06/21 14:09:27 <-- 200 https://k8s.gcr.io/v2/kube-apiserver/manifests/v1.24.2 (54.20325ms)
2022/06/21 14:09:27 HTTP/2.0 200 OK
Content-Length: 1694
Alt-Svc: h3=":443"; ma=2592000,h3-29=":443"; ma=2592000,h3-Q050=":443"; ma=2592000,h3-Q046=":443"; ma=2592000,h3-Q043=":443"; ma=2592000,quic=":443"; ma=2592000; v="46,43"
Content-Type: application/vnd.docker.distribution.manifest.list.v2+json
Date: Tue, 21 Jun 2022 20:09:27 GMT
Docker-Content-Digest: sha256:433696d8a90870c405fc2d42020aff0966fb3f1c59bdd1f5077f41335b327c9a
Docker-Distribution-Api-Version: registry/2.0
Server: Docker Registry
X-Frame-Options: SAMEORIGIN
X-Xss-Protection: 0

{
   "schemaVersion": 2,
   "mediaType": "application/vnd.docker.distribution.manifest.list.v2+json",
   "manifests": [
      {
         "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
         "size": 949,
         "digest": "sha256:e31b9dc1170027a5108e880ab1cdc32626fc7c9caf7676fd3af1ec31aad9d57e",
         "platform": {
            "architecture": "amd64",
            "os": "linux"
         }
      },
      {
         "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
         "size": 949,
         "digest": "sha256:a650cc38f804847dfa3a1043fa5c55d479be4e9c87be2ba1c3d58b803eec33e9",
         "platform": {
            "architecture": "arm64",
            "os": "linux"
         }
      }

      <-- other images retracted -->
   ]
}

Since my system is amd64, the image e31b9d is downloaded. If you follow the logs you'll see that image is resolved and its manifest containing references to its layers are located and then downloaded.

All that aside, you'll end up with the container contents on your local file system.

$ ls

bin             etc             lib             run             tmp
boot            go-runner       proc            sbin            usr
dev             home            root            sys             var

From here you can easily inspect or modify its contents. For example, using go tooling, its possible to determine exactly how the kube-apiserver binary was built.

$ go version -m usr/local/bin/kube-apiserver

usr/local/bin/kube-apiserver: go1.18.3
        path    k8s.io/kubernetes/cmd/kube-apiserver
        build   -asmflags=all=-trimpath=/workspace/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes
        build   -compiler=gc
        build   -gcflags="all=-trimpath=/workspace/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes "
        build   -ldflags="<-- ommited -->"
        build   -tags=selinux,notest
        build   CGO_ENABLED=0
        build   GOARCH=amd64
        build   GOOS=linux
        build   GOAMD64=v1

While great for introspecting container images, this is particularly useful when you want to look into configuration stored in an OCI artifact. In the world of Carvel, bundles of configuration can be stored in a container registry. For example, the official kpack package is available at projects.registry.vmware.com/tce/kpack. Using ls can reveal the version 0.5.2 is available, and using export, you can look inside it.

$ crane export projects.registry.vmware.com/tce/kpack:0.5.2 - | tar xv

x .
x .imgpkg
x .imgpkg/images.yml
x config
x config/ca_cert.yaml
x config/kapp-config.yaml
x config/kp-config.yaml
x config/overlay.yaml
x config/proxy.yaml
x config/release
x config/release/release-0.5.2-rc.9.yaml
x config/schema.yaml
x config/version.yml

Inside config/ there are multiple Kubernetes YAML files which are part of this package bundle.

Copying

A final use case to cover is replicating artifacts and images between registries. One reason to do this is when you need to make an image available in an internet-restricted environment. For example, when you need an image like k8s.gcr.io/kube-apiserver to be available in your private registery that runs in the same network as your clusters.

Copying an artifact or image is done using the copy command.

$ crane cp k8s.gcr.io/kube-apiserver:v1.24.2 index.docker.io/joshrosso/kube-apiserver:v1.24.2

2022/06/21 15:54:37 Copying from k8s.gcr.io/kube-apiserver:v1.24.2 to index.docker.io/joshrosso/kube-apiserver:v1.24.2
2022/06/21 15:54:39 pushed blob: sha256:d3377ffb7177cc4becce8a534d8547aca9530cb30fac9ebe479b31102f1ba503
2022/06/21 15:54:40 pushed blob: sha256:63186d32234e6ca9751e21f84bda2a6f5025eb3a44196f6dc4d0e9268ba7bbe0
2022/06/21 15:54:40 pushed blob: sha256:36698cfa5275e0bda70b0f864b7b174e0758ca122d8c6a54fb329d70082c73f8
2022/06/21 15:54:41 pushed blob: sha256:b71d10928c08172a60416656c3b43c55ccbe83255f704e9cb4108351994aaaed
<-- multiple logs removed -->
2022/06/21 15:55:12 index.docker.io/joshrosso/kube-apiserver:v1.24.2: digest: sha256:433696d8a90870c405fc2d42020aff0966fb3f1c59bdd1f5077f41335b327c9a size: 1694

During this copy operation, all platform/architecture combinations are copied and each image digest is retained. You can verify this by running a diff against the 2 manifests.

$ COPY=$(crane manifest index.docker.io/joshrosso/kube-apiserver:v1.24.2) \
    ORIG=$(crane manifest k8s.gcr.io/kube-apiserver:v1.24.2) \
    diff <(echo ${COPY}) <(echo ${ORIG})

Shoutouts

This post is largely a callout to the awesomeness of google/go-containerregistry and its command line tool crane. Thanks to all the maintainers and contributors that have built these tools and worked on documentation.