In a previous post, I talked about how I set up my own Docker registry. One issue with running your own Docker registry is that you need to manage the storage and do things like clean up old unused tags/images. There were a few gotchas to doing this automatically so I thought I might write a short post about it.
First, the Docker registry by default doesn't allow deletion. One has to
either pass in an environment variable
REGISTRY_STORAGE_DELETE_ENABLED=true or use a config yaml file
storage.delete.enabled: true (see
stackoverflow post). I opted for using a config file like this:
version: 0.1 log: fields: service: registry storage: cache: blobdescriptor: inmemory filesystem: rootdirectory: /var/lib/registry delete: enabled: true # allow deletes maintenance: readonly: enabled: false # enable readonly mode before garbage collections http: addr: :5000 headers: X-Content-Type-Options: [nosniff] health: storagedriver: enabled: true interval: 10s threshold: 3
I then change my docker-compose.yaml to add a volume:
volumes: - ./registry-config.yaml:/etc/docker/registry/config.yml
Next, I found a Docker image of a script that would delete old tags of images.
(Github) with the
./registry.py -l userName:password -r registry_url --delete --num 15. It work pretty much exactly the way I wanted!
Lastly, I found out that the DELETE only soft deletes an image.
It would only be really deleted after running a garbage collection. The
registry image comes with a tool to run garbage collection so we just need to
run that command. We can script that command and force a garbage collection.
Furthermore, during the garbage collection, we should prevent writes, which
could cause issues. That's why I also have the
storage.maintenance.readonly.enabled option. I first change that
to enabled, reboot my server, run the garbage collection, change it back to
disabled and reboot server again.
#!/bin/bash set -e cd directory_with_file echo "enabling read only mode for garbage collection" git checkout registry-config.yaml sed -i 's/enabled: false # enable readonly mode before garbage collections/enabled: true # enable readonly mode before garbage collections/' registry-config.yaml docker-compose stop registry docker-compose up -d registry echo 'starting garbage collection' docker-compose exec -T registry bin/registry garbage-collect --delete-untagged /etc/docker/registry/config.yml echo 'finished garbage collection. re-enabling writes to the registry' git checkout registry-config.yaml docker-compose stop registry docker-compose up -d registry
Okay, this all looks pretty good. Now to do it automatically, I used my Kubernetes cluster's CronJob feature as well as linux's default crontab feature. On my box with the self-hosted Docker registry, I set up the following crontab to run at 4:27AM every day:
27 4 * * * /directory_with_file/garbage_collect_docker_images.sh >> /var/log/gc_docker.log 2>&1
This redirects all output of the previous script to a file at
/var/log/gc_docker.log. Next, I set up a Kubernetes cronjob to
anoxis/registry-cli image like this:
--- apiVersion: batch/v1 kind: CronJob metadata: name: delete-old-images-docker-registry namespace: cron spec: schedule: "0 4 * * *" # every day at 4am jobTemplate: spec: template: spec: containers: - name: delete-old-images-docker-registry image: anoxis/registry-cli command: [ "./registry.py", "-l", "userName:password", "-r", "registry_url", "--delete", "--num", "15", ] resources: requests: cpu: 10m memory: 50Mi limits: cpu: 300m memory: 300Mi restartPolicy: OnFailure backoffLimit: 3
Great! Now the last thing I want is to be able to pick up the logs from the crontab I set up earlier so that I can see them in Kubernetes for easier monitoring. I have a k3s agent on that server so I can also set up a CronJob to simply read the logs, clear them, and exit. Then my promtail agent can read the logs from the pod and store them for easier access. That config looks like thus:
--- apiVersion: batch/v1 kind: CronJob metadata: name: read-crontab-log namespace: cron spec: schedule: "0 5 * * *" # every day at 5am, cron job should run at 4:27 jobTemplate: spec: template: spec: containers: - name: read-crontab-log image: busybox:1.35 command: ["/bin/sh", "-c", "cat /gc_docker.log; >| /gc_docker.log"] # >| replaces the file with an empty file resources: requests: cpu: 10m memory: 10Mi limits: cpu: 10m memory: 10Mi volumeMounts: - name: log-volume mountPath: /gc_docker.log restartPolicy: OnFailure volumes: - name: log-volume hostPath: path: /var/log/gc_docker.log type: File backoffLimit: 3
Thanks to this setup, I went from 12GB of storage used for the Docker registry to 3GB and I can feel confident that from now on, I'll be able to maintain a low storage usage.
One other minor note: The DELETE API provided by the registry only allows for the deletion of manifest, not tags. What this means for us that is even if we use the API to delete all manifests associated with a tag, that tag will still show up in the catalog API for gets. In order to delete it from the catalog, you have to delete it manually from the disk like thus:
rm -rf data/docker/registry/v2/repositories/repository_name
Only then will the repository be properly cleaned up from the catalog API. Doing this without manual deletion from disk be possible in the SoonTM upcoming 3.x version of the repository. But until then, this appears to be the only way.
Any error corrections or comments can be made by sending me a pull request.