2022-01-13
Contents
Recently, I decided I wanted to setup my own Kubernetes cluster for personal projects (not including this blog). I was kind of bored over the holidays and remember reading a few months ago someone using Kubernetes for their own personal projects and also reading the discussion on Hacker News . I remember thinking that would be pretty cool. At my previous job, we used K8s a lot, but it was a non-standard approach to using K8s (we primarily used to schedule a ton of short duration jobs as opposed to long running services). So, setting up K8s for my own projects would also be a learning process on more normal usage of K8s.
I've attempted to set up K8s previously before as well, but each time the compute costs really dampened my enthusiasm. Typical costs would be $30-40 a month. Although it's possible to get a fairly cheap GKE cluster, it requires the use of preemptible VMs, which means that the VM could be shut down abruptly at any time and VMs only last 24 hours regardless (though I think the example in the blog post could potentially be even cheaper!). AWS and Azure seem even more expensive due to the cost of the control plane. One of the main reasons I decided on DigitalOcean is that it's much much easier to understand pricing than GCP and other cloud providers. I've also had a generally positive experience with them in the past.
For example, I had some previous personal projects using DigitalOcean's App Platform. App Platform was simple and worked without issues, but it has the downside of costing a static amount a month regardless of usage. For my personal projects which basically have no usage, that's a drain on the budget. I kind of wish there was something like GCP's Cloud Run where you pay for usage, but alas (at the point I started hosting that personal project, Cloud Run didn't support websockets which is needed for my app). Migrating those App Platform projects to K8s could save some money if I can control my K8s costs.
The process of clicking through the UI and making a 1 node cluster by itself was pretty mundane so I'll skip the details. I picked the cheapest nodes that were using AMD ($12/month). The first interesting challenge was setting up Ingress. Ingress is a K8s object that manages external access to the services in a cluster. In order to run web services, you definitely need it. Typically, one also need a load balancer. For example, the DigitalOcean tutorial that describes how to set up Nginx Ingress and cert manager uses a DigitalOcean Load Balancer (which costs $10/month). That's a hefty price tag compared to the compute cost so I wanted to avoid using that.
I stumbled upon this this blog post and this StackOverflow answer which helped me figure it out.
This is what my helm config file ended up being.---
controller:
kind: DaemonSet
daemonset:
useHostPort: true
dnsPolicy: ClusterFirstWithHostNet
hostNetwork: true
service:
type: ClusterIP
priorityClassName: high-priority
nodeSelector:
loadbalancer: nginx
config: {
#long timeout for websockets
proxy-read-timeout: 86400,
}
rbac:
create: true
It's very similar to the example to the aforementioned blog post. But I'll comment on it a little bit. The first notable thing type: ClusterIP. That prevents the default of type: LoadBalancer from automatically creating a DigitalOcean load balancer, which would result in additional billing charges. This has the additional implication that this service is only reachable from within the cluster itself, which is something you have to fix.
The more important configuration, and the key to this whole method is the
hostNetwork: true setting. This means that the nginx Daemon can
directly use the host computer's networking and bind to ports 80/443. Then, if
one was to access the IP of the host computer directly (assuming a firewall
rule has been created to allow traffic through), one would be able to interact
with the nginx server directly. For example, assume the IP is 198.51.100.0. If
someone from outside the cluster pointed their browser to 198.51.100.0, nginx
would serve traffic to them on port 80 . The only downside is that in K8s
services could move around to various nodes all the time, resulting in a
different IP that an external service would need to connect to in order to be
served by nginx. My strategy to avoid this is to use nodeSelector to
force the nginx server onto a node with the label loadbalancer: nginx.
Then on the droplet of interest, I ran
kubectl label nodes node_name loadbalancer=nginx
and forced the
nginx server to always run on that droplet. With that, I can simply access
that IP whenever I want to access the nginx server.
This setup is great, but I quickly discovered one issue. When I started adding my services for my side projects to my cluster, my cluster would be over overcommitted and my nginx server would be shut down in favor of other services. To avoid that, I create a PriorityClass called high-priority so that nginx will always be prioritized to be run.
Additionally, it turns out you can actually assign a floating IP to this droplet that has the nginx server despite the documentation claiming it's not possible. See this other blog post for someone who took advantage of this fact for more advanced strategies. By assigning a floating IP to our droplet IP and direct all public DNS records to the floating IP, swapping which droplet the nginx server is on shouldn't be too painful in the future.
With this setup, setting up certbot and nginx to properly reverse proxy to the correct services was relatively straightforward by following the tutorial. The biggest major snag I ran into was in general the resource constraints on my 1 node cluster. The default services that DigitalOcean adds to my cluster and their CPU requested are cilium (300m), 2 instances of coredns (100m each), and 1 digitalocean-node-agent (102m). All that adds up to 602m of a total of 900m allocatable CPU or 66.9% of my compute resources. That doesn't leave much space at all for anything else! I couldn't run Prometheus/Grafana without basically using up all the remaining compute. I suppose that's the cost of being on a low budget setup.
Another minor issue I ran into was that metrics-server doesn't work
installed out of the box. Luckily, DigitalOcean provides a
tutorial
to handle this. Apparently, you have to pass the flag
--kubelet-preferred-address-types=InternalIP
through to make
metrics-server work.
As mentioned earlier, setting up CI/CD is one of my goals. Manually deploying is just a pain so having it do it automatically is great!
The first problem I encountered was that the free tier of the Container Registry that DigitalOcean provides only has 500MB of storage. The docker of image of my PandemicOnline server was over 400 MB. When I made a single small dependency update and uploaded it to my registry, I broke the storage limit on the free tier. I suppose I could have optimized my Node.js server's Docker image, but given that this seems like a problem that will keep occuring with this small of a storage limit, I figured that wasn't sustainable. The cheapest paid plan of the Container Registry only offers 5GB and costs $5/month. For that price, I might as well self host a docker registry on a droplet, which comes with 25GB.
I could have hosted the registry on the K8s cluster, but I'd have to attach more permanent storage, which is also a bit of a pain. Hosting on a droplet also prevents a potential circular dependency issue where the K8s depends on the docker registry, but the docker registry can't boot up because K8s is dead. I took inspiration from Mike Cartmell's blog post where he does something similar. Like him, I used docker-compose so my server could boot up a server + the container registry. I initially set it up using nginx. Setting up the cert with Let’s Encrypt was quite involved. Even the so called docker-nginx-certbot, which was supposed to make it easy to set up SSL was not trivial. I eventually got it working, but it was a huge ordeal.
I've read about Caddy on Hacker News before so I decided to experiment with it a bit to see how easy it was to use. It was honestly extremely simple. Everything just worked on my first try and I simply needed a single Caddyfile to handle my server.
(auth) {
basicauth {
[REDACTED username] [REDACTED password]
}
}
[REDACTED].com {
import auth
reverse_proxy /v2* registry:5000
log {
output file /var/log/registry-access.log
}
}
and it was all good to go.
This is what my docker-compose.yml looks like
version: "2.0"
services:
caddy:
restart: always
image: caddy:2
container_name: caddy
ports:
- "80:80"
- "443:443"
volumes:
- "./Caddyfile:/etc/caddy/Caddyfile:ro"
- "./caddy-logs:/var/log/"
- "./caddy-config:/config"
- "./caddy-data:/data"
registry:
restart: always
image: registry:2.7
container_name: registry
ports:
# accessible via localhost:2879
- 2879:5000
volumes:
- ./data:/var/lib/registry
Life was a lot simpler with Caddy and I'll definitely consider using them again in the future.
After setting up my container server, I followed the documentation and added a secret to my cluster to integrate it. It worked without an issue.
One other interesting decision I had was how to automatically ensure CI/CD would pick the right tag of a Docker image. I tag my images with their Git short SHA and really wanted to a way to determine what the latest tag of an image is so that when I do continuous delivery, it knows when a new image needs to be pulled and when it doesn't. Unfortunately, the docker registry API doesn't provide an easy way to pull image by upload date or something like that.
I solved this by making it so that whenever a new image is built, I update a
ConfigMap with the latest tag. Then, in my CI/CD I simply look up the
latest value of the ConfigMap and use sed to do a replacement on the
appropriate yaml configs. After that, I can safely run
kubectl -f apply
on all my yaml configs. K8s will then
automatically figure out if new pods with the an updated image needs to be
spun up or not.
This was added after the initial posting.
The aforementioned setup worked reasonably well... until DigitalOcean did an automated node upgrade, which destroyed the node set up with the tag loadbalancer: nginx. This meant that no node had that set up and nginx would not get created on any node at all, which meant no web traffic could be served. In addition, the special firewall rule I created was also removed from my specially prepared node. My whole set up was basically ruined.
My solution was to set up a service inside the k8s cluster itself to detect these undesireable conditions. This is what the flow chart generally looks like:
Http Server Reachable
Yes
🎉
No
Check for existence of nginx pod
Yes
Add Floating IP to the droplet matching the node name
where the pod exists on
No
Add "loadbalancer: nginx" tag to the single node
Add Floating IP to the droplet matching the node name
where the pod exists on
The only additional complication is that because the request comes from inside the cluster, firewall rules will automatically let in any request regardless of how it would treat external traffic. Because of this, I always do a check that the firewall rule is applied to some k8s droplet and add it if it is needed.
You'll need to pass in 4 arguments representing: HOSTNAME, FLOATING_IP, FIREWALL_ID, DIGITALOCEAN_API_TOKEN. The API token can be created via these instructions and you can use that to find the FIREWALL_ID for the relevant firewall.
In terms of the kubernetes configurations, the important thing is that you will need to create a ServiceAccount and use ClusterRoleBinding to bind a ClusterRole that allows that account to list and patch both pods and nodes. In addition, standard ingress will also need to be set up to allow using the host to passed in earlier to route to the pod where the http server is set up. Note that although this works for me, it may not work for you.
package main
import (
"context"
"errors"
"fmt"
"log"
"net/http"
"os"
"time"
"github.com/digitalocean/godo"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/apimachinery/pkg/types"
"k8s.io/client-go/kubernetes"
"k8s.io/client-go/rest"
)
func AliveResponse(w http.ResponseWriter, r *http.Request) {
//log.Print("Request from ", r.RemoteAddr)
}
func GetFirewallRuleInfo(ctx context.Context, client *godo.Client, firewallId string) (*godo.Firewall, error) {
firewall, _, err := client.Firewalls.Get(ctx, firewallId)
return firewall, err
}
func AddFirewallToDroplet(ctx context.Context, client *godo.Client, droplet godo.Droplet, firewallId string) error {
log.Printf("Adding firewall rule %s to droplet %s", firewallId, droplet.Name)
_, err := client.Firewalls.AddDroplets(ctx, firewallId, droplet.ID)
return err
}
func GetKubernetesDroplet(ctx context.Context, client *godo.Client) (godo.Droplet, error) {
droplets, _, err := client.Droplets.List(ctx, nil)
if err != nil {
log.Printf("Droplets.List returned error: %v", err)
return godo.Droplet{}, err
}
var targetDroplet godo.Droplet
outer:
for _, droplet := range droplets {
for _, tag := range droplet.Tags {
if tag == "k8s" {
targetDroplet = droplet
break outer
}
}
}
if targetDroplet.Name == "" {
errorString := fmt.Sprintf("Could not find droplet with k8s tag. Found %v\n", droplets)
log.Print(errorString)
return godo.Droplet{}, errors.New(errorString)
}
return targetDroplet, nil
}
func AddReservedIPToDroplet(ctx context.Context, client *godo.Client, targetDroplet godo.Droplet, reservedIP string) error {
reservedIPResult, _, err := client.ReservedIPs.Get(ctx, reservedIP)
if err != nil {
log.Printf("get for reservedIP returned error: %v", err)
return err
}
if reservedIPResult.Droplet != nil {
log.Print("reservedIP already assigned")
return nil
}
log.Printf("Assigning %s to %s\n", reservedIP, targetDroplet.Name)
_, _, err = client.ReservedIPActions.Assign(ctx, reservedIP, targetDroplet.ID)
if err != nil {
log.Printf("assigning reservedIP returned error: %v", err)
return err
}
return nil
}
func main() {
args := os.Args[1:]
host := args[0]
reservedIP := args[1]
firewallId := args[2]
digitalOceanToken := args[3]
log.Printf("Reserved Ip: %s, Host: %s, firewallId: %s\n", reservedIP, host, firewallId)
digitalOceanClient := godo.NewFromToken(digitalOceanToken)
go func() {
http.HandleFunc("/", AliveResponse)
log.Fatal(http.ListenAndServe(":8000", nil))
}()
config, err := rest.InClusterConfig()
if err != nil {
panic(err.Error())
}
clientset, err := kubernetes.NewForConfig(config)
if err != nil {
panic(err.Error())
}
opts := metav1.ListOptions{
LabelSelector: "app.kubernetes.io/name=ingress-nginx",
}
labelPatch := []byte(`{"metadata":{"labels":{"loadbalancer": "nginx"}}}`)
var httpClient = &http.Client{
Timeout: time.Second * 5,
}
httpRequest, err := http.NewRequest("GET", "http://"+reservedIP, nil)
if err != nil {
log.Fatal(err)
}
httpRequest.Host = host
i := 0
for {
if i > 0 {
time.Sleep(10 * time.Second)
}
i += 1
ctx := context.TODO()
// We're running this from within the k8s cluster and digitalocean's kubernetes cluster allows all internal ips through its firewalls.
// That means an http request won't tell us if the firewall is working or not. So, we should always check the firewall
firewall, err := GetFirewallRuleInfo(ctx, digitalOceanClient, firewallId)
if err != nil {
log.Printf("error getting firewall info %v\n", err)
continue
}
targetDroplet, err := GetKubernetesDroplet(ctx, digitalOceanClient)
if err != nil {
// error is logged in the function so don't print again
continue
}
if len(firewall.DropletIDs) == 0 {
err = AddFirewallToDroplet(ctx, digitalOceanClient, targetDroplet, firewallId)
if err != nil {
log.Printf("Error adding to firewall rule %s to droplet %s: %v\n", firewallId, targetDroplet.Name, err)
} else {
log.Printf("Succesfully added firewall rule %s to droplet %s", firewallId, targetDroplet.Name)
}
}
resp, err := httpClient.Do(httpRequest)
if err != nil {
log.Printf("error making request %v\n", err)
} else if resp.StatusCode == 200 {
if i > 0 && i%100 == 0 {
log.Printf("no issues on iteration %d\n", i)
}
continue
}
log.Printf("Trying to allow node to serve the http request \n")
pods, err := clientset.CoreV1().Pods("").List(ctx, opts)
if err != nil {
log.Fatal(fmt.Sprintf("Error querying pods: %v", err.Error()))
}
if len(pods.Items) != 1 {
log.Printf("There are %d pods in the cluster, not the 1 expected\n", len(pods.Items))
if err != nil {
log.Fatal(fmt.Sprintf("Error querying pods: %v", err.Error()))
}
log.Printf("Applying label loadbalancer=nginx to node %s\n", targetDroplet.Name)
node, err := clientset.CoreV1().Nodes().Patch(ctx, targetDroplet.Name, types.StrategicMergePatchType, labelPatch, metav1.PatchOptions{})
if err != nil {
log.Fatal(fmt.Sprintf("Error updating node: %v", err.Error()))
}
log.Printf("Node %s now has labels %v on it.\n", node.Name, node.Labels)
} else {
log.Printf("There are %d pods in the cluster %s\n", len(pods.Items), pods.Items[0].Name)
}
err = AddReservedIPToDroplet(ctx, digitalOceanClient, targetDroplet, reservedIP)
if err == nil {
log.Printf("Succesfully assigned %s to %s\n", reservedIP, targetDroplet.Name)
}
}
}
Item | Cost/Month |
Control plane | $0 |
1 Premium AMD (1 vCPU, 2GB total RAM with 1 GB of usable RAM) Node | $12 |
1 Premium AMD Droplet (1 vCPU, 1GB total RAM) for Registry Server | $6 |
Total | $18 |
If I had used Intel CPUs instead, this would only cost $15 a month, but I prefer using AMD. Without a separate Container Registry, this would only be $12/month.
Any error corrections or comments can be made by sending me a pull request.