2022-08-22

Automating Docker Image Cleanup

linux docker

In a previous post, I talked about how I set up my own Docker registry. One issue with running your own Docker registry is that you need to manage the storage and do things like clean up old unused tags/images. There were a few gotchas to doing this automatically so I thought I might write a short post about it.

First, the Docker registry by default doesn't allow deletion. One has to either pass in an environment variable REGISTRY_STORAGE_DELETE_ENABLED=true or use a config yaml file with storage.delete.enabled: true (see stackoverflow post). I opted for using a config file like this:

version: 0.1
log:
  fields:
    service: registry
storage:
  cache:
    blobdescriptor: inmemory
  filesystem:
    rootdirectory: /var/lib/registry
  delete:
    enabled: true # allow deletes
  maintenance:
    readonly:
      enabled: false # enable readonly mode before garbage collections
http:
  addr: :5000
  headers:
    X-Content-Type-Options: [nosniff]
health:
  storagedriver:
    enabled: true
    interval: 10s
    threshold: 3

I then change my docker-compose.yaml to add a volume:

volumes:
      - ./registry-config.yaml:/etc/docker/registry/config.yml

Next, I found a Docker image of a script that would delete old tags of images. I used anoxis/registry-cli (Github) with the following flags: ./registry.py -l userName:password -r registry_url --delete --num 15. It work pretty much exactly the way I wanted!

Lastly, I found out that the DELETE only soft deletes an image. It would only be really deleted after running a garbage collection. The registry image comes with a tool to run garbage collection so we just need to run that command. We can script that command and force a garbage collection. Furthermore, during the garbage collection, we should prevent writes, which could cause issues. That's why I also have the storage.maintenance.readonly.enabled option. I first change that to enabled, reboot my server, run the garbage collection, change it back to disabled and reboot server again.

That results this script:
#!/bin/bash
set -e

cd directory_with_file
echo "enabling read only mode for garbage collection"
git checkout registry-config.yaml
sed -i 's/enabled: false # enable readonly mode before garbage collections/enabled: true # enable readonly mode before garbage collections/' registry-config.yaml
docker-compose stop registry
docker-compose up -d registry
echo 'starting garbage collection'
docker-compose exec -T registry bin/registry garbage-collect --delete-untagged /etc/docker/registry/config.yml
echo 'finished garbage collection. re-enabling writes to the registry'
git checkout registry-config.yaml
docker-compose stop registry
docker-compose up -d registry

Okay, this all looks pretty good. Now to do it automatically, I used my Kubernetes cluster's CronJob feature as well as linux's default crontab feature. On my box with the self-hosted Docker registry, I set up the following crontab to run at 4:27AM every day:

27 4 * * * /directory_with_file/garbage_collect_docker_images.sh >> /var/log/gc_docker.log 2>&1

This redirects all output of the previous script to a file at /var/log/gc_docker.log. Next, I set up a Kubernetes cronjob to run the anoxis/registry-cli image like this:

See/Hide Yaml Configuration
---
apiVersion: batch/v1
kind: CronJob
metadata:
  name: delete-old-images-docker-registry
  namespace: cron
spec:
  schedule: "0 4 * * *" # every day at 4am
  jobTemplate:
    spec:
      template:
        spec:
          containers:
            - name: delete-old-images-docker-registry
              image: anoxis/registry-cli
              command:
                [
                  "./registry.py",
                  "-l",
                  "userName:password",
                  "-r",
                  "registry_url",
                  "--delete",
                  "--num",
                  "15",
                ]
              resources:
                requests:
                  cpu: 10m
                  memory: 50Mi
                limits:
                  cpu: 300m
                  memory: 300Mi
          restartPolicy: OnFailure
      backoffLimit: 3

Great! Now the last thing I want is to be able to pick up the logs from the crontab I set up earlier so that I can see them in Kubernetes for easier monitoring. I have a k3s agent on that server so I can also set up a CronJob to simply read the logs, clear them, and exit. Then my promtail agent can read the logs from the pod and store them for easier access. That config looks like thus:

See/Hide Yaml Configuration
---
apiVersion: batch/v1
kind: CronJob
metadata:
  name: read-crontab-log
  namespace: cron
spec:
  schedule: "0 5 * * *" # every day at 5am, cron job should run at 4:27
  jobTemplate:
    spec:
      template:
        spec:
          containers:
            - name: read-crontab-log
              image: busybox:1.35
              command:
                ["/bin/sh", "-c", "cat /gc_docker.log; >| /gc_docker.log"]
                # >| replaces the file with an empty file
              resources:
                requests:
                  cpu: 10m
                  memory: 10Mi
                limits:
                  cpu: 10m
                  memory: 10Mi
              volumeMounts:
                - name: log-volume
                  mountPath: /gc_docker.log
          restartPolicy: OnFailure
          volumes:
            - name: log-volume
              hostPath:
                path: /var/log/gc_docker.log
                type: File
      backoffLimit: 3

Thanks to this setup, I went from 12GB of storage used for the Docker registry to 3GB and I can feel confident that from now on, I'll be able to maintain a low storage usage.

One other minor note: The DELETE API provided by the registry only allows for the deletion of manifest, not tags. What this means for us that is even if we use the API to delete all manifests associated with a tag, that tag will still show up in the catalog API for gets. In order to delete it from the catalog, you have to delete it manually from the disk like thus:

rm -rf data/docker/registry/v2/repositories/repository_name

Only then will the repository be properly cleaned up from the catalog API. Doing this without manual deletion from disk be possible in the SoonTM upcoming 3.x version of the repository. But until then, this appears to be the only way.

References


Any error corrections or comments can be made by sending me a pull request.

Back