Smaller and Safer Clojure Containers: Minimizing the Software Bill of Materials
From Debian to Google's Distroless Images
We are exposed to supply chain security vulnerabilities whenever we use containers (or almost any software). This can be problematic because our goal is to offer a dependable and secure service to our users, which these vulnerabilities can disrupt as they heighten our risk of being hacked. We don't need to take this as it is—we can act and mitigate these risks proactively. Learn how to reduce your Software Bill of Materials and help your SRE engineers sleep better.
A software bill of materials is a list of all the open source and third-party components present in a codebase. An SBOM also lists the licenses that govern those components, the versions of the components used in the codebase, and their patch status, which allows security teams to quickly identify any associated security or license risks.
When we use public container registries like the docker hub, some bad actors may have sneaked their malware into the image. If this piece of software becomes part of our SBOM, it can compromise the application. We should include only the essential dependencies in our containers to minimize the risk.
Let's take the Docker configuration used in the previous post, Deploying Clojure Like a Seasoned Hobbyist, as an example and use it as our vehicle to explore this topic further from the perspective of how much extra is included in the container and what known vulnerabilities we can find with a security scan, try reducing the image size and SBOM, and compare the final artifact sizes between the different approaches.
Examples are available in the repository.
Inspecting The Image
I'll tag the build (from the previous post) clj-tools-deps-buster
to have a shorter descriptive name for this context.
❯ docker tag \
registry.digitalocean.com/clojure-sample-app/dev \
clj-tools-deps-buster
First, we want access to the manifest to find the container configuration, so let's start by creating an archive from all of the files in the container, including the manifest.
❯ docker save clj-tools-deps-buster > clj-tools-deps-buster.tar
Then, let's unpack the archive to get access to the files.
❯ tar -xvf clj-tools-deps-buster.tar
Now, we can find the configuration file from the manifest.json
.
❯ cat manifest.json| jq ".[0].Config"
"3085b45633b1c0f87a3ea67000d7d77e2f14a74d1b2968966f866c7c3531f745.json"
The configuration file is too large to add here, so let's fire up a Babashka REPL and investigate the configuration file a bit further.
If we look at this container's history, the last four steps come from the Dockerfile we used, and the rest already exist in the container.
(require '[cheshire.core :as json])
(def config
(-> "3085b45633b1c0f87a3ea67000d7d77e2f14a74d1b2968966f866c7c3531f745.json"
slurp
(json/parse-string true)))
(keys config)
;; => (:architecture :config :created :history :os :rootfs)
(count (get config :history))
;; => 23
(for [{:keys [created_by]} (take-last 4 (get config :history))]
created_by)
;; =>
;; ("WORKDIR /tmp/app"
;; "COPY . . # buildkit"
;; "RUN /bin/sh -c clojure -P # buildkit"
;; "CMD [\"/bin/sh\" \"-c\" \"clj -X:run\"]")
We can see that the commands are the same as configured in the previous post's Dockerfile. Other authors did all the previous steps, so we might not know what they did. Let's take the third entry from the history as an example.
(print (get-in config [:history 2 :created_by]))
;; /bin/sh -c set -eux; apt-get update; \
;; apt-get install -y --no-install-recommends -certificates curl netbase wget; \
;; rm -rf /var/lib/apt/lists/*
The original author of the base image had decided to install curl, netbase, and wget, which all ended up in our application image.
Let's take a look at what commands we have pre-installed on the container by first finding all apt-get install
commands from the image history.
(def install-commands
(for [{:keys [created_by]} (get config :history)
:when (str/includes? created_by "apt-get install")]
created_by))
(count install-commands)
;; => 5
It looks like we have five different steps using the apt-get install
. Let's see what these are all about.
(for [command install-commands]
(into []
(comp
(filter #(str/includes? % "apt-get install"))
(map str/trim))
(-> command
(str/replace #"\t" "")
(str/split #";"))))
;; (["apt-get install -y --no-install-recommends ca-certificates curl netbase wget"]
;; ["apt-get install -y --no-install-recommends gnupg dirmngr"]
;; ["/bin/sh -c apt-get update && apt-get install -y --no-install-recommends git mercurial openssh-client subversion procps && rm -rf /var/lib/apt/lists/*"]
;; ["apt-get install -y --no-install-recommends bzip2 unzip xz-utils binutils fontconfig libfreetype6 ca-certificates p11-kit"]
;; ["/bin/sh -c apt-get update && apt-get install -y make rlwrap && rm -rf /var/lib/apt/lists/* && wget https://download.clojure.org/install/linux-install-$CLOJURE_VERSION.sh && sha256sum linux-install-$CLOJURE_VERSION.sh && echo \"7677bb1179ebb15ebf954a87bd1078f1c547673d946dadafd23ece8cd61f5a9f *linux-install-$CLOJURE_VERSION.sh\" | sha256sum -c - && chmod +x linux-install-$CLOJURE_VERSION.sh && ./linux-install-$CLOJURE_VERSION.sh && rm linux-install-$CLOJURE_VERSION.sh && clojure -e \"(clojure-version)\""])
The result is that we have a bunch of unnecessary executables for the final container. On top of these, to get the complete picture of what is installed, we would still need to go through the Clojure install scripts. But we're not going to go there this time around. Let's continue with the image itself.
Before moving forward, let's see the size of the image and if the base image has any known vulnerabilities.
❯ docker image ls | grep clj-tools-deps-buster
clj-tools-deps-buster ... 725MB
The image size is 725MB. A screenshot from the docker hub shows that the base image we used has 17 critical and 27 high-priority vulnerabilities. Let's get back to these numbers a bit later.
Reduce the Image Size
Next, let's see if we can reduce the image size and vulnerabilities by changing to smaller base images.
Using Slim Buster
Create a new file slim.Dockerfile
FROM clojure:openjdk-17-tools-deps-slim-buster
WORKDIR app
COPY . .
RUN clojure -P
CMD clj -X:run
Build the image.
❯ docker build \
--file=slim.Dockerfile . \
-t clj-tools-deps-slim --no-cache
Check the size of the container.
❯ docker image ls | grep clj-tools-deps-slim
clj-tools-deps-slim ... 553MB
We see that by using the clojure:openjdk-17-tools-deps-slim-buster
, we successfully reduced the image size by 172 MB and made minor improvements in vulnerability counts.
And let's do the same once more with the Alpine version.
Using Alpine Linux Image
Once again, create a new docker file alpine.Dockerfile
.
FROM clojure:openjdk-17-tools-deps-alpine
WORKDIR app
COPY . .
RUN clojure -P
CMD clj -X:run
Build the image and check the size.
❯ docker image ls | grep clj-tools-deps-alpine
clj-tools-deps-alpine ... 356MB
This is good progress. We went from 725MBs to 356MBs. The net total is -369MBs; we can see improvements with vulnerability counts.
We can still improve from here by going distroless. Yes, you read correctly, distroless.
Distroless Containers
Google provides language-specific base containers to run application code without a Linux distribution on the image. These base images don't even have shells. When using a language like Rust, You could have only the kernel and include the standard C-libraries in your binary.
🥑 Language focused docker images, minus the operating system.
For Clojure code, we'll use the Java image gcr.io/distroless/java17-debian12. We must build a JAR file for the containers to run the application code without the Clojure dependencies.
So, let's get to it.
Build JAR
First, let's create a JAR build of the application following the instructions in the Clojure build.tools reference.
Let's start by updating the application to dynamically read the env variables since they are unavailable during the build time.
(ns main
(:require [ring.adapter.jetty :as jetty]
[next.jdbc :as jdbc])
(:gen-class))
(defn get-port []
(Integer/parseInt (System/getenv "PORT")))
(defn get-db-conf []
{:dbtype "postgres"
:jdbcUrl (System/getenv "JDBC_DATABASE_URL")})
(defn datasource []
(jdbc/get-datasource (get-db-conf)))
(defn app [_request]
(let [db-version (jdbc/execute! (datasource) ["SELECT version()"])]
{:status 200
:headers {"Content-Type" "application/edn"}
:body (str db-version)}))
(defn -main [& _args]
(jetty/run-jetty #'app {:port (get-port)}))
Add new alias build
into deps.edn
{...
:aliases
{....
:build {:deps {io.github.clojure/tools.build
{:git/tag "v0.9.6" :git/sha "8e78bcc"}}
:ns-default build}}}
And then create a build.clj
file.
(ns build
(:require [clojure.tools.build.api :as b]))
(def class-dir "target/classes")
(def basis (b/create-basis {:project "deps.edn"}))
(def uber-file "target/standalone.jar")
(defn clean [_]
(b/delete {:path "target"}))
(defn uber [_]
(clean nil)
(b/copy-dir {:src-dirs ["src" "resources"]
:target-dir class-dir})
(b/compile-clj {:basis basis
:ns-compile '[main]
:class-dir class-dir})
(b/uber {:class-dir class-dir
:uber-file uber-file
:basis basis
:main 'main}))
Now, we should be able to build the JAR file by running:
❯ clj -T:build uber
After this step, we should have the following files in the target
folder.
❯ ls target
classes standalone.jar
Using Distroless Java
Lastly, let's create a distroless image to run the application JAR.
FROM clojure:openjdk-17-tools-deps-buster as base
WORKDIR app
COPY . .
RUN clj -T:build uber
FROM gcr.io/distroless/java17-debian12
COPY --from=base /tmp/app/target/standalone.jar /tmp/app/standalone.jar
CMD ["/tmp/app/standalone.jar"]
As Timo Kramer pointed out, we can also create the container with our build script using Google's Jib library. Take a look at how Datahike uses it.
Build your image and check the image size.
❯ docker image ls | grep jvm-distroless
jvm-distroless ... 234MB
Let's run a security scan for all the images and see how the numbers compare.
Comparing Image Sizes and Known Vulnerabilities
For this step, I'll be using Trivy since the docker hub doesn't provide the numbers for the distroless images.
Trivy is a comprehensive and versatile security scanner. Trivy has scanners that look for security issues, and targets where it can find those issues.
Here's an example of a partial output. See the complete reports here.
❯ trivy image clj-tools-deps-alpine
clj-tools-deps-buster (debian 10.12)
======
Total: 1016 (UNKNOWN: 9, LOW: 611, MEDIUM: 170, HIGH: 177, CRITICAL: 49)
....
Java (jar)
=====
Total: 48 (UNKNOWN: 0, LOW: 10, MEDIUM: 20, HIGH: 14, CRITICAL: 4)
Trivy scans the system and the JAR vulnerabilities in one go, so I'll add both separately to the graphs.
We can see clearly that in the case of system vulnerabilities, the amount goes down with the container size. The image size doesn't affect JAR vulnerabilities significantly. Based on this, the clj-tools-deps-buster
probably has some extra development time dependencies that are not required for the final image.
Conclusion
Distroless images might not suit your use case, and going distroless doesn't mean the container will be bulletproof. But it's still good to know that it's an option. There's much more to container security than just the packages installed on the image. Minimizing the SBOM is one more trick on your sleeve. I recommend reading OWASP's docker security cheatsheet for additional security steps.
I hope you found this helpful. Thank you for reading.
Feel free to reach out and let me know what you think—social links in the menu.