What’s Up With OCI References?

Dan Lorenc
7 min readJul 16, 2021

Note: This is current as of July 16th 2021

Photo by Hello I'm Nik on Unsplash

Let’s say you build and push a container image to a registry, and then you generate an SBOM, a signature, or anything else relevant to that image. You can easily store those other objects in a registry as well, but there’s no good way to indicate that those objects “refer” to the original image. If you have the image URL, you can’t automatically retrieve or lookup all the things that “refer” to it.

There are some workarounds for this that you can use today, but there are also a few different efforts to add “reference” support directly to OCI (Open Containers Initiative) registries. Many important use-cases, particularly related to supply-chain security, are waiting for this feature, but the progress is hard to follow from the outside. This post is my attempt to summarize the current state of discussion and what’s left before we can all start using these awesome new features!

At a high-level, there are still two outstanding proposals on how to add support for this feature (one of which is my own).One of the goals for this post is to to highlight how close they are, and how few differences I think there actually are between the two. The main differences remaining are some decisions around major/minor versioning and garbage collection semantics.

For the rest of this post:

Proposal 1: https://github.com/opencontainers/image-spec/pull/828

Proposal 2: https://github.com/opencontainers/artifacts/pull/29

If you haven’t read the proposals recently, please take another look first. They’ve both changed significantly recently. This post represents my opinion and my interpretation of the content of each.

Major Minor Versioning

This is a bit hard to talk about because there are a few different specifications and projects involved, which each have their own version numbers. Currently, the Distribution Specification is Version 1.0.0, it references an Image Specification that is also Version 1.0.0, but that contains Manifest types which have schemaVersion: 3.

At a high level — in Proposal 1, extensions are defined on the existing Image types. Proposal 2 is for a new Artifact type, with a concept of Reference and Non-Reference sub-types. I’m not clear on whether the Artifact type from Proposal 2 is intended to become “the spiritual successor” to the Image type, or if they are expected to live together in the long term. This is mostly because of my confusion in how the manifest schemaVersion bump from 2 to 3 (shown below) will be handled.

Here is how that would change in the proposals.

I don’t think either proposal is very clear yet on what version bumps would be appropriate for the Distribution Specification. Proposal 2 places slightly more burden on the registry server through the introduction of a new Manifest Schema, so this may be a larger change to registry implementations.

The introduction of garbage collection into the distribution specification for either option may be considered a large enough change to require a major version spec bump (v2.0.0), or it might be able to be done with just a minor version (v1.1.0)

Garbage Collection Semantics

Photo by the blowup on Unsplash

Garbage collection is also a difficult concept to talk about in the existing proposals because they are framed as changes to the existing specifications (correctly), and Garbage Collection is not currently specified. Many registries do implement some form of garbage collection — the specifications allow them to. This should be taken into account when designing specification changes, but there’s disagreement as to where it should happen in the proposals. This is further complicated by the lack of widespread understanding about the exact garbage collection semantics of each registry.

I think it’s fair to say that there’s agreement that garbage collection semantics need to be considered in these designs, and that no one wants to make garbage collection impossible. The three existing (including the one withdrawn) proposals have all run into Garbage Collection issues. This section tries to explain the issues we’ve identified and how the different proposals attack this:

Existing Semantics

Many registries do not implement garbage collection. Many registries do not allow deletion through the API, or offer custom TTL/deletion policies. The exact garbage collection semantics are configurable by end-users of some registries. Some registries garbage collect blobs and not manifests, some registries garbage collect everything. Some garbage collect nothing. I’m not aware of an exhaustive list of the existing policies, but at a high-level, the registries that do implement garbage collection mostly work as follows.

The main principle is that objects referenced by tags, or by things that are referenced by tags, are preserved. Tags reference manifests, which reference configs and blobs. Many tags can reference one manifest. Manifests can reference many configs and blobs, and many manifests can reference one config or blob.

This example shows how the references flow down from Tags, to Manifests, and finally down to Configs/Blobs. These can all be stored in the same underlying system or different ones. Registry implementations typically treat Manifests differently from Configs/Blobs because they are smaller and must be parsed.

In this example, Manifest sha256:foobar123 can be deleted, because it has nothing referencing it. The references from that manifest to Config/Blobs sha256:foobar456 and sha256:foobar567 can be deleted, leaving them each with only one remaining reference, from sha256:foobar234:

Issues

I think the main issue with Garbage Collection policies and References was articulated best in this sub-thread on the PR, by Justin Cormack. If references can point to anything, “cycles” can be created. See this diagram for an example:

From the earlier CAS diagrams, we can see what it looks like if the two tags (v1 and v2) are deleted, but the manifests have direct references to each other:

One basic interpretation of garbage collection rules is as follows, and leads to the issue:

  1. Delete the tag foobar:v2
  2. The registry decrements a reference count on the manifest that tag pointed to: foobar234
  3. That manifest is now at 0, so it can be deleted.
  4. BUT, that manifest is “referenced” by the other manifest, sha256:foobar123.
  5. What happens now?
  6. This turns the problem from simple reference counting into mark and sweep.

I don’t think there’s a clear, single, accepted answer as to what the behavior here should be across every registry, especially for registries that implement garbage collection policies. It’s key that the specification allows for registries to efficiently implement the desired behavior by end users, fitting into their existing policies.

Proposals

The proposals here really don’t differ very far, and don’t actually need to differ at all. Proposal 1 leaves Garbage Collection completely out of the specification, like it is today, with the idea that registries can implement their own logic and semantics on top of the types to get their own desired behavior. Proposal 1 does not actually contain any validation yet that this is actually the case — more work would be needed to understand the existing semantics and explain how registries can implement efficient behavior for the desired end-user use cases, to show that it is actually possible.

Proposal 2 does a few things differently here, with the intention of preventing “cycles”:

  • Defines Reference Types as a new, unique concept.
  • Only allows Reference Types to refer to Manifest types (not blobs or descriptors)
  • Reference Types have a list of blobs (similar to layers), but these blobs cannot refer to other Manifest types
  • Defines some basic semantics around garbage collection (Reference Types SHOULD not be tagged, Reference Types SHOULD be cleaned up when the objects they refer to are deleted)

In this example, we can see that cycles are avoided. Reference Types can only refer to Non Reference Types. When the :v2 tag is removed, we can delete the sha256:foobar234 manifest, and the sha256:foobar123 Reference Type can be deleted as well.

This uses a new Type to achieve the effectively segregate the “Manifest Namespace” layer in the CAS diagram into two partitions, and only allow arrows to go one direction:

Summary

Everything above here was intended to convey my interpretation of facts and designs, not opinions. If there’s anything incorrect above, please comment or suggest fixes directly so we can all be working on the same state!

Now it’s my opinion time! Feel free to stop reading below here :)

To start simple, there’s generally a desire to get some form of Reference Type support soon. The two proposals don’t differ in desired use-cases to support. I can state that definitely, because I wrote one, so I can just say that I want to support the same use cases! The main points of disagreement are:

  • Whether this should, or even can be done with only a minor version bump.
  • Which versions and specifications exactly would be changing.
  • What level of garbage collection semantics should or needs to be in the specification itself (we all agree garbage collection should remain possible and efficient for registries that want to implement it).

I also think there’s agreement that major version bumps should be done only if absolutely necessary, to avoid fragmenting the community and ecosystem.

Now What?

I don’t know! I think we’ll continue to iterate on the garbage collection semantics and then figure out whether or not this feature requires a new schema version or manifest type, but exactly where and when this will get figured out is still unclear.

Thanks to Jason Hall, Jon Johnson Jr., Nisha K. and Michael Brown for feedback here! This was originally shared as a Google Doc here.

--

--