Smaller and Safer Clojure Containers: Minimizing the Software Bill of Materials

Smaller and Safer Clojure Containers: Minimizing the Software Bill of Materials

From Debian to Google's Distroless Images

We are exposed to supply chain security vulnerabilities whenever we use containers (or almost any software). This can be problematic because our goal is to offer a dependable and secure service to our users, which these vulnerabilities can disrupt as they heighten our risk of being hacked. We don't need to take this as it is—we can act and mitigate these risks proactively. Learn how to reduce your Software Bill of Materials and help your SRE engineers sleep better.

A software bill of materials is a list of all the open source and third-party components present in a codebase. An SBOM also lists the licenses that govern those components, the versions of the components used in the codebase, and their patch status, which allows security teams to quickly identify any associated security or license risks.

Synopsys: What is a Software Bill of Materials

When we use public container registries like the docker hub, some bad actors may have sneaked their malware into the image. If this piece of software becomes part of our SBOM, it can compromise the application. We should include only the essential dependencies in our containers to minimize the risk.

Let's take the Docker configuration used in the previous post, Deploying Clojure Like a Seasoned Hobbyist, as an example and use it as our vehicle to explore this topic further from the perspective of how much extra is included in the container and what known vulnerabilities we can find with a security scan, try reducing the image size and SBOM, and compare the final artifact sizes between the different approaches.

Examples are available in the repository.

Inspecting The Image

I'll tag the build (from the previous post) clj-tools-deps-buster to have a shorter descriptive name for this context.

❯ docker tag \
    registry.digitalocean.com/clojure-sample-app/dev \
    clj-tools-deps-buster

First, we want access to the manifest to find the container configuration, so let's start by creating an archive from all of the files in the container, including the manifest.

❯ docker save clj-tools-deps-buster > clj-tools-deps-buster.tar

Then, let's unpack the archive to get access to the files.

❯ tar -xvf clj-tools-deps-buster.tar

Now, we can find the configuration file from the manifest.json.

❯ cat manifest.json| jq ".[0].Config"
"3085b45633b1c0f87a3ea67000d7d77e2f14a74d1b2968966f866c7c3531f745.json"

The configuration file is too large to add here, so let's fire up a Babashka REPL and investigate the configuration file a bit further.

If we look at this container's history, the last four steps come from the Dockerfile we used, and the rest already exist in the container.

(require '[cheshire.core :as json])

(def config
  (->  "3085b45633b1c0f87a3ea67000d7d77e2f14a74d1b2968966f866c7c3531f745.json"
       slurp
       (json/parse-string true)))

(keys config)
;; => (:architecture :config :created :history :os :rootfs)

(count (get config :history))
;; => 23

(for [{:keys [created_by]} (take-last 4 (get config :history))]
   created_by)
;; =>
;; ("WORKDIR /tmp/app"
;;  "COPY . . # buildkit"
;;  "RUN /bin/sh -c clojure -P # buildkit"
;;  "CMD [\"/bin/sh\" \"-c\" \"clj -X:run\"]")

We can see that the commands are the same as configured in the previous post's Dockerfile. Other authors did all the previous steps, so we might not know what they did. Let's take the third entry from the history as an example.

(print (get-in config [:history 2 :created_by]))
;; /bin/sh -c set -eux; apt-get update; \
;; apt-get install -y --no-install-recommends -certificates curl netbase wget; \
;; rm -rf /var/lib/apt/lists/*

The original author of the base image had decided to install curl, netbase, and wget, which all ended up in our application image.

Let's take a look at what commands we have pre-installed on the container by first finding all apt-get install commands from the image history.

(def install-commands
  (for [{:keys [created_by]} (get config :history)
        :when (str/includes? created_by "apt-get install")]
    created_by))

(count install-commands)
;; => 5

It looks like we have five different steps using the apt-get install. Let's see what these are all about.

(for [command install-commands]
  (into []
        (comp 
         (filter #(str/includes? % "apt-get install"))
         (map str/trim))
        (-> command
            (str/replace #"\t" "")
            (str/split #";"))))

;; (["apt-get install -y --no-install-recommends ca-certificates curl netbase wget"]
;;  ["apt-get install -y --no-install-recommends gnupg dirmngr"]
;;  ["/bin/sh -c apt-get update && apt-get install -y --no-install-recommends git mercurial openssh-client subversion procps && rm -rf /var/lib/apt/lists/*"]
;;  ["apt-get install -y --no-install-recommends bzip2 unzip xz-utils binutils fontconfig libfreetype6 ca-certificates p11-kit"]
;;  ["/bin/sh -c apt-get update && apt-get install -y make rlwrap && rm -rf /var/lib/apt/lists/* && wget https://download.clojure.org/install/linux-install-$CLOJURE_VERSION.sh && sha256sum linux-install-$CLOJURE_VERSION.sh && echo \"7677bb1179ebb15ebf954a87bd1078f1c547673d946dadafd23ece8cd61f5a9f *linux-install-$CLOJURE_VERSION.sh\" | sha256sum -c - && chmod +x linux-install-$CLOJURE_VERSION.sh && ./linux-install-$CLOJURE_VERSION.sh && rm linux-install-$CLOJURE_VERSION.sh && clojure -e \"(clojure-version)\""])

The result is that we have a bunch of unnecessary executables for the final container. On top of these, to get the complete picture of what is installed, we would still need to go through the Clojure install scripts. But we're not going to go there this time around. Let's continue with the image itself.

Before moving forward, let's see the size of the image and if the base image has any known vulnerabilities.

❯ docker image ls | grep clj-tools-deps-buster
clj-tools-deps-buster ...  725MB

The image size is 725MB. A screenshot from the docker hub shows that the base image we used has 17 critical and 27 high-priority vulnerabilities. Let's get back to these numbers a bit later.

Reduce the Image Size

Next, let's see if we can reduce the image size and vulnerabilities by changing to smaller base images.

Using Slim Buster

Create a new file slim.Dockerfile

FROM clojure:openjdk-17-tools-deps-slim-buster
WORKDIR app
COPY . .

RUN clojure -P

CMD clj -X:run

Build the image.

❯ docker build \
  --file=slim.Dockerfile . \
  -t clj-tools-deps-slim --no-cache

Check the size of the container.

❯ docker image ls | grep clj-tools-deps-slim
clj-tools-deps-slim ... 553MB

We see that by using the clojure:openjdk-17-tools-deps-slim-buster , we successfully reduced the image size by 172 MB and made minor improvements in vulnerability counts.

And let's do the same once more with the Alpine version.

Using Alpine Linux Image

Once again, create a new docker file alpine.Dockerfile.

FROM clojure:openjdk-17-tools-deps-alpine
WORKDIR app
COPY . .

RUN clojure -P

CMD clj -X:run

Build the image and check the size.

❯ docker image ls | grep clj-tools-deps-alpine
clj-tools-deps-alpine ... 356MB

This is good progress. We went from 725MBs to 356MBs. The net total is -369MBs; we can see improvements with vulnerability counts.

We can still improve from here by going distroless. Yes, you read correctly, distroless.

Distroless Containers

Google provides language-specific base containers to run application code without a Linux distribution on the image. These base images don't even have shells. When using a language like Rust, You could have only the kernel and include the standard C-libraries in your binary.

🥑 Language focused docker images, minus the operating system.

Github: Google Container Tools

For Clojure code, we'll use the Java image gcr.io/distroless/java17-debian12. We must build a JAR file for the containers to run the application code without the Clojure dependencies.

So, let's get to it.

Build JAR

First, let's create a JAR build of the application following the instructions in the Clojure build.tools reference.

Let's start by updating the application to dynamically read the env variables since they are unavailable during the build time.

(ns main
  (:require [ring.adapter.jetty :as jetty]
            [next.jdbc :as jdbc])
  (:gen-class))

(defn get-port []
  (Integer/parseInt (System/getenv "PORT")))

(defn get-db-conf []
  {:dbtype "postgres"
   :jdbcUrl (System/getenv "JDBC_DATABASE_URL")})

(defn datasource []
  (jdbc/get-datasource (get-db-conf)))

(defn app [_request]
  (let [db-version (jdbc/execute! (datasource) ["SELECT version()"])]
    {:status  200
     :headers {"Content-Type" "application/edn"}
     :body    (str db-version)}))

(defn -main [& _args]
  (jetty/run-jetty #'app {:port (get-port)}))

Add new alias build into deps.edn

{...
 :aliases 
  {....
   :build {:deps {io.github.clojure/tools.build
                   {:git/tag "v0.9.6" :git/sha "8e78bcc"}}
           :ns-default build}}}

And then create a build.clj file.

(ns build
  (:require [clojure.tools.build.api :as b]))

(def class-dir "target/classes")
(def basis (b/create-basis {:project "deps.edn"}))
(def uber-file "target/standalone.jar")

(defn clean [_]
  (b/delete {:path "target"}))

(defn uber [_]
  (clean nil)
  (b/copy-dir {:src-dirs   ["src" "resources"]
               :target-dir class-dir})
  (b/compile-clj {:basis      basis
                  :ns-compile '[main]
                  :class-dir  class-dir})
  (b/uber {:class-dir class-dir
           :uber-file uber-file
           :basis     basis
           :main      'main}))

Now, we should be able to build the JAR file by running:

❯ clj -T:build uber

After this step, we should have the following files in the target folder.

❯ ls target
classes  standalone.jar

Using Distroless Java

Lastly, let's create a distroless image to run the application JAR.

FROM clojure:openjdk-17-tools-deps-buster as base

WORKDIR app
COPY . .
RUN clj -T:build uber

FROM gcr.io/distroless/java17-debian12
COPY --from=base /tmp/app/target/standalone.jar /tmp/app/standalone.jar

CMD ["/tmp/app/standalone.jar"]

As Timo Kramer pointed out, we can also create the container with our build script using Google's Jib library. Take a look at how Datahike uses it.

Build your image and check the image size.

❯ docker image ls | grep jvm-distroless
jvm-distroless ... 234MB

Let's run a security scan for all the images and see how the numbers compare.

Comparing Image Sizes and Known Vulnerabilities

For this step, I'll be using Trivy since the docker hub doesn't provide the numbers for the distroless images.

Trivy is a comprehensive and versatile security scanner. Trivy has scanners that look for security issues, and targets where it can find those issues.

Github: Trivy

Here's an example of a partial output. See the complete reports here.

❯ trivy image clj-tools-deps-alpine

clj-tools-deps-buster (debian 10.12)
======
Total: 1016 (UNKNOWN: 9, LOW: 611, MEDIUM: 170, HIGH: 177, CRITICAL: 49)

....

Java (jar)
=====
Total: 48 (UNKNOWN: 0, LOW: 10, MEDIUM: 20, HIGH: 14, CRITICAL: 4)

Trivy scans the system and the JAR vulnerabilities in one go, so I'll add both separately to the graphs.

We can see clearly that in the case of system vulnerabilities, the amount goes down with the container size. The image size doesn't affect JAR vulnerabilities significantly. Based on this, the clj-tools-deps-buster probably has some extra development time dependencies that are not required for the final image.

Conclusion

Distroless images might not suit your use case, and going distroless doesn't mean the container will be bulletproof. But it's still good to know that it's an option. There's much more to container security than just the packages installed on the image. Minimizing the SBOM is one more trick on your sleeve. I recommend reading OWASP's docker security cheatsheet for additional security steps.

I hope you found this helpful. Thank you for reading.

Feel free to reach out and let me know what you think—social links in the menu.