Episode 2: making Nuleculization of Applications faster

With this episode I will try to shed some light on what a good container image is and what we can learn (or even generate) from it.

First of all, it should fulfill my needs, it should not be something that I build for a single purpose. To fulfill my needs I need to configure the container that is started from the image. I don’t want to build a special purpose container image that only fits my one application I am working on. The ‘good container image’ should be a atomic component that will be reused by a Nulecule.

Lets have a look at Postgresql, there is a wonderful OpenShift Postgresql container image based on CentOS7 and installing Postgresql itself from the software collections. In addition to the obvious, the OpenShift team has added a few LABELs which in turn are used by OpenShift: “io.openshift.expose-services” could be used to create an OpenShift service straight from that container image, “io.k8s.display-name” contains the string that will show up in the OpenShift web console if you look at pods based on this Postgresql container image. I think you got the pattern: lets use LABELs to deliver value add via the toolchain. And the OpenShift team has put lots of valuable documentation (in and) around the Dockerfile.

Can we introduce LABELs that will help the Nulecule toolchain? I think so.

I have set up a proof of concept to generate a Nulecule file from such a ‘good container image’. The POC will use labels under “io.projectatomic.nulecule” to give information what needs to be present in the Nulecule file. Lets have a look at the ‘(even more) good container image’: I put it on my repo on github wich is a fork of the OpenShift Postgresql.

It first thing you will notice is that I translated the human readable documentation OpenShift team included as comments in the Dockerfile to labels. It is around https://github.com/goern/postgresql/blob/feature/enhanced-labels/9.4/Dockerfile.rhel7#L22 you see a list of required and optional environment variables that this container images uses. These two labels can be directly translated to Nulecule parameters (ok, we lack constraints, but hey…)

The second improvement of the good container image, is at line 65: VOLUME is used by docker build and the label “io.projectatomic.nulecule.volume” could be used to generate parts of Nulecule’s storage requirements.

And by the way we can generate Nulecule metadata from “io.k8s” and “io.openshift” labels (around the head of the Dockerfile).

The proof of concept for these ideas is Grasshopper (it is also available to Fedora 2223). You can use it to guess a Nulecule file from a good container image:

sudo dnf copr enable goern/grasshopper
sudo dnf install grasshopper
curl -gO https://raw.githubusercontent.com/goern/postgresql/feature/enhanced-labels/9.4/Dockerfile.rhel7
grasshopper-0.0.47 nulecule guess Dockerfile.rhel7

What you will see is a Nulecule file (and ja, the parameters are missing with version 0.0.47, it’s a POC remember) completely generated from the Dockerfile. QED we can release the developer from writing all that Nulecule boiler plate code. What we can not do is to release  him from structuring his application, back to our example that means: he must write to WordPress Nulecule file and all its artifacts.

Open Questions: can we even generate the artifacts for such a base container? I will explore this in the following weeks. What I know is that we can generate OpenShift/Kubernetes PersistentVolumeClaims from Nulecule’s storage requirement. That feature is targeted for grasshopper 0.1.0 (aka xmas) release.

How do we know about good container images? Where to find them? I completely unashamed reused an [idea of Vasek: nulecule-library index searching][10] You can use grasshopper to find a set of Nuleculized applications which are based on (more or less) good container images:

grasshopper nulecule index list


Stay tuned!


[10]: https://github.com/projectatomic/atomicapp/pull/379