Kubernetes, GitOps, and FluxCD
Background & Goals
Since I got literally no work done on my server, I think it would be prudent to scale down. Rather than opting for a larger option.
In the order I want, here are concrete goals:
- Syncthing: (I need this in order to sync files between my two laptops)
- Oauth2/Openid/Ldap (Kanidm or Authentik)
- I might switch to Kanidm from authentik as my authentication server, as it seems a lot simpler… but it doesn’t seem to support invites
- Virtual Machine host with a web UI that I can give out to others. I’m currently looking at Incus or Ovirt.
I recently learned that Ovirt was still maintained, and it seems to be feature complete. It contains every feature I want, like oauth2 authentication, port security, and a web UI. Although, due to Red Hat abandoning the project, it likely wont’ get beyond feature updates, and instead just get bug and security updates, the software does what I want it to do.
Software Selection
Virtual Machine Manager
Incus: * Authentication - Openid connection * Authorization - Openfga authorization - Do I have to create a project for each user? (seems to be no… Incus can be configured to dynamically create projects for all users in a specific user group) - What is the difference between the varios levels of authority * Port security - Can be overrided on a per instance basis… but how can I make this an unchangable default?
Authentication
I’m currently deciding between kanidm and authentik.
Here is an authentik on kubernetes with fluxcd guide I foudn.
Testing Incus
So, Incus is only packaged in Debian backports. The first step is to add those. After that, apt update
, apt upgrade
and apt install incus-tools incus incus-agent incus-client
.
Then, to initialize incus, first steps documentation.
RKE2 Try 2
I uninstalled RKE2, but I want to redeploy my services on it again.
curl -sfL https://get.rke2.io | sudo sh -
(for some reason it crashed and didn’t start when I ran it in a root machinectl session)
I then copied over /etc/rancher/rke2/rke2.yaml
to ~/.kube/config
on my local machine, in order to configure kubernetes from my local machine.
FluxCD
Now, I also realized that git can work over ssh. So I have a git repo, called fleet-charts
, located on my server, which I will access from my laptop via ssh.
[moonpie@lizard vscode]$ flux bootstrap git --url ssh://moonpie@moonpiedumpl.ing/fleet-charts --branch=main --private-key-file=/home/moonpie/.ssh/moonstack
► cloning branch "main" from Git repository "ssh://moonpie@moonpiedumpl.ing/fleet-charts"
⚠️ clone failure: unable to clone: repository not found: git repository: 'ssh://moonpie@moonpiedumpl.ing/fleet-charts'
⚠️ clone failure: unable to clone: repository not found: git repository: 'ssh://moonpie@moonpiedumpl.ing/fleet-charts' ✗ failed to clone repository: unable to clone: repository not found: git repository: 'ssh://moonpie@moonpiedumpl.ing/fleet-charts
I find this odd, because ssh works normally:
[moonpie@lizard vscode]$ ssh moonpie@moonpiedumpl.ing -i /home/moonpie/.ssh/moonstack
moonpie@thoth:~$
[moonpie@lizard vscode]$ flux bootstrap git --url ssh://moonpie@moonpiedumpl.ing:22/home/moonpie/fleet-charts --branch=main --private-key-file=/home/moonpie/.ssh/moonstack --verbose
► cloning branch "main" from Git repository "ssh://moonpie@moonpiedumpl.ing:22/home/moonpie/fleet-charts"
✔ cloned repository
► generating component manifests
✔ generated component manifests
✔ committed component manifests to "main" ("a69831db70bea88e9ebc9810b78a33831929793c")
► pushing component manifests to "ssh://moonpie@moonpiedumpl.ing:22/home/moonpie/fleet-charts" ► installing components in "flux-system" namespace
So it looks like I must use an absolute path, and cannot use “~” for relative patths. Or maybe I can use the $HOME
environment variable.
But I actually don’t like this setup. I uninstalled flux, and I want to redeploy it, but wish ssh on a different port instead. I want port 22 on this server to be availble for the forgejo ssh service, rather than to be a the administrative ssh service. I’m going to change ssh to port 22022
in order to avoid conflicts with other services.
Show install command
[moonpie@lizard vscode]$ flux bootstrap git --url ssh://moonpie@moonpiedumpl.ing:22022/home/moonpie/fleet-charts --branch=main --private-key-file=/home/moonpie/.ssh/moonstack --verbose
► cloning branch "main" from Git repository "ssh://moonpie@moonpiedumpl.ing:22022/home/moonpie/fleet-charts"
✔ cloned repository
► generating component manifests
✔ generated component manifests
✔ component manifests are up to date
► installing components in "flux-system" namespace
✔ installed components
✔ reconciled components
► determining if source secret "flux-system/flux-system" exists
► generating source secret
✔ public key: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABgQCelSERwSNpguy4f2oqrpkPgtq3MT7iKY7fVnofpp72hqdfLH4Z0i34HFy8vXKPL1aKd07HNiMFPujG8E/lE/pb3W5sSNkJPh//YZRz2SlZo7Mh2tkBDLe3Ap8GQgJk/jJHMoCS7YQudT4rAi/vNBuHvMBaFCjXBLqwbaoRBxm5t7hiNFi1I9cSdrIP8v6fubv2VbWV72kiwq/IQeJkURFN9UZJFQ6/Dd6os4ZZg3IEY+EVCpkOyi8d8KnS8fnd8vMk/96jl8mBqRk8ZCsBu6qRbs3HfT6FqCuIgIblxixhrpVmJRJ8cJzMGT5I8deuTPZQ4gPNYYNdxkHW8oztISx0Jql15LtgeJi1iQMwj3ZqIEXPxbgWYZc57jodGvdo7PQTAa3PXOopIJrbmQNi6T2OLwgjidWDgYs7gDJdmAFv52g8zeRh7HyO83yCC7IC1MXodLd9zJinvyBRg5DAdKQnW7zTbcEDsUSGgEI+LQdShRcShmnBzDtJMs2oQujLOaM=
Please give the key access to your repository: y
► applying source secret "flux-system/flux-system"
✔ reconciled source secret
► generating sync manifests
✔ generated sync manifests
✔ committed sync manifests to "main" ("e3f5512df167ca2bc974428cff0dc17787d713f1")
► pushing sync manifests to "ssh://moonpie@moonpiedumpl.ing:22022/home/moonpie/fleet-charts"
► applying sync manifests
✔ reconciled sync configuration
◎ waiting for GitRepository "flux-system/flux-system" to be reconciled
✔ GitRepository reconciled successfully
◎ waiting for Kustomization "flux-system/flux-system" to be reconciled
✔ Kustomization reconciled successfully
► confirming components are healthy
✔ helm-controller: deployment ready
✔ kustomize-controller: deployment ready
✔ notification-controller: deployment ready
✔ source-controller: deployment ready ✔ all components are healthy
And just like that, fluxcd is installed.
flux suspend source git flux-system
pauses flux’s reconciliation with the git repo. This allows me to test applying resources with kubectl apply -f
, before I commit to git.
flux resume source git flux-system
resumes flux’s reconciliation.
Reverse Proxy (Traefik, then Nginx)
The first step of my cluster should be my reverse proxy, as an ingress. This exposes basically all of my services.
The flux example of helm page actually has an example where they set up helm.
[moonpie@lizard home-manager]$ kubectl version
Client Version: v1.30.2
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
The connection to the server localhost:8080 was refused - did you specify the right host or port?
[moonpie@lizard home-manager]$ git remote -v^C
[moonpie@lizard home-manager]$ flux create source helm traefik --url https://helm.traefik.io/traefik --namespace traefik
✚ generating HelmRepository source
► applying HelmRepository source
✗ namespaces "traefik" not found
[moonpie@lizard home-manager]$ flux create source helm traefik --url https://helm.traefik.io/traefik
✚ generating HelmRepository source
► applying HelmRepository source
✔ source created
◎ waiting for HelmRepository source reconciliation
✔ HelmRepository source reconciliation completed
✔ fetched revision: sha256:48513aa497c9bf46e3053d2aef7e4d184d6df2165389a6024b03f8565fd501e8
[moonpie@lizard home-manager]$ flux create helmrelease my-traefik --chart traefik --source HelmRepository/traefik
✚ generating HelmRelease
► applying HelmRelease
✔ HelmRelease created
◎ waiting for HelmRelease reconciliation ✗ context deadline exceeded
Is this a failure? I can’t tell?
[moonpie@lizard fleet-charts]$ flux get sources all
NAME REVISION SUSPENDED READY MESSAGE
gitrepository/flux-system main@sha1:e3f5512d False True stored artifact for revision 'main@sha1:e3f5512d'
NAME REVISION SUSPENDED READY MESSAGE
helmrepository/traefik sha256:48513aa4 False True stored artifact: revision 'sha256:48513aa4'
NAME REVISION SUSPENDED READY MESSAGE
helmchart/flux-system-my-traefik 31.0.0 False True pulled 'traefik' chart with version '31.0.0'
[moonpie@lizard fleet-charts]$ kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
flux-system helm-controller-76dff45854-pj876 1/1 Running 0 3d1h
flux-system kustomize-controller-6bc5d5b96-jzj24 1/1 Running 0 3d1h
flux-system my-traefik-5b4fbbd9c8-2rck9 1/1 Running 0 7m24s
flux-system notification-controller-7f5cd7fdb8-7db4q 1/1 Running 0 3d1h
flux-system source-controller-54c89dcbf6-p2gd6 1/1 Running 0 3d1h
kube-system cloud-controller-manager-thoth 1/1 Running 0 3d3h
kube-system etcd-thoth 1/1 Running 0 3d3h
kube-system helm-install-rke2-canal-hmjrm 0/1 Completed 0 3d3h
kube-system helm-install-rke2-coredns-m2jwz 0/1 Completed 0 3d3h
kube-system helm-install-rke2-ingress-nginx-cszxd 0/1 Completed 0 3d3h
kube-system helm-install-rke2-metrics-server-gkqfd 0/1 Completed 0 3d3h
kube-system helm-install-rke2-snapshot-controller-crd-ztz6n 0/1 Completed 0 3d3h
kube-system helm-install-rke2-snapshot-controller-f2zfz 0/1 Completed 0 3d3h
kube-system helm-install-rke2-snapshot-validation-webhook-52kj2 0/1 Completed 0 3d3h
kube-system kube-apiserver-thoth 1/1 Running 0 3d3h
kube-system kube-controller-manager-thoth 1/1 Running 0 3d3h
kube-system kube-proxy-thoth 1/1 Running 0 3d3h
kube-system kube-scheduler-thoth 1/1 Running 0 3d3h
kube-system rke2-canal-gb7bx 2/2 Running 0 3d3h
kube-system rke2-coredns-rke2-coredns-6bb85f9dd8-zzqlv 1/1 Running 0 3d3h
kube-system rke2-coredns-rke2-coredns-autoscaler-7b9c797d64-4bwcb 1/1 Running 0 3d3h
kube-system rke2-ingress-nginx-controller-ct4mj 1/1 Running 0 3d3h
kube-system rke2-metrics-server-868fc8795f-5t6v6 1/1 Running 0 3d3h
kube-system rke2-snapshot-controller-7dcf5d5b46-5dtvt 1/1 Running 0 3d3h
kube-system rke2-snapshot-validation-webhook-bf7bbd6fc-gqqgr 1/1 Running 0 3d3h
[moonpie@lizard fleet-charts]$ git pull
Already up to date. [moonpie@lizard fleet-charts]$
Another weird thing is that no changes were made to the git repo where I was tracking flux… but changes were made to my cluster. I thought the point of flux was that all state was in the git repo, but that doesn’t seem to be the case here.
Oh. Oops. RKE2 comes with an nginx controller already. I may have to remove that if I want traefik as an ingress controller.
Thankfully, it doesn’t appear to be too hard.
/etc/rancher/rke2/config.yaml
disable:
- rke2-coredns
- rke2-ingress-nginx
And now, those services are disabled.
Oh, and I was wrong, there are files in the git repo now.
[moonpie@lizard fleet-charts]$ ls *
begin.md
flux-system: gotk-components.yaml gotk-sync.yaml kustomization.yaml
[moonpie@lizard flux-system]$ wc -l *
12385 gotk-components.yaml
27 gotk-sync.yaml
5 kustomization.yaml 12417 total
Uuuh… That’s a lot of lines. I think that gotk-components.yaml
file has basically all of the fluxcd components stored and tracked in there.
[moonpie@lizard flux-system]$ cat * | grep traefik [moonpie@lizard flux-system]$
And… no mentions of traefik? It’s obviously stored in the cluster, given something related shows up when I observe the kubernetes pods, but nothing appears in the git repo.
[moonpie@lizard flux-system]$ flux get sources all -A
NAMESPACE NAME REVISION SUSPENDED READY MESSAGE flux-system gitrepository/flux-system main@sha1:e3f5512d False False failed to checkout and determine revision: unable to list remote for 'ssh://moonpie@moonpiedumpl.ing:22022/home/moonpie/fleet-charts': dial tcp: lookup moonpiedumpl.ing on 10.43.0.10:53: read udp 10.42.0.22:38747->10.43.0.10:53: i/o timeout
Okay, it appears that flux is having trouble accessing my git repo. I found a relevant github issue, and it looks like a DNS problem. It looks, since I disabled the Kubernetes CoreDNS service, DNS wasn’t working inside my cluster, preventing it from accessing my domain name.
So:
/etc/rancher/rke2/config.yaml
disable:
# Yeah so apparently this was kind of important.
# - rke2-coredns
- rke2-ingress-nginx
And with this, flux bootstrap works properly:
[moonpie@lizard vscode]$ flux bootstrap git --url ssh://moonpie@moonpiedumpl.ing:22022/home/moonpie/flux-config --branch=main --private-key-file=/home/moonpie/.ssh/moonstack --verbose --insecure-skip-tls-verify
► cloning branch "main" from Git repository "ssh://moonpie@moonpiedumpl.ing:22022/home/moonpie/flux-config"
✔ cloned repository
...
...
✔ reconciled sync configuration
◎ waiting for GitRepository "flux-system/flux-system" to be reconciled
✔ GitRepository reconciled successfully
◎ waiting for Kustomization "flux-system/flux-system" to be reconciled
✔ Kustomization reconciled successfully
► confirming components are healthy
✔ helm-controller: deployment ready
✔ kustomize-controller: deployment ready
✔ notification-controller: deployment ready
✔ source-controller: deployment ready ✔ all components are healthy
I also changed the name of the git repo to flux-config
.
I also realized that the flux-system
repo is the configs of the flux-system namespace. Meaning, each directory should be a namespace? However, I don’t think I’m going to use many namespaces, they seem like extra complexity designed for multi-project or multi-user kubernetes clusters.
[moonpie@lizard vscode]$ flux create source helm traefik --url https://helm.traefik.io/traefik
✚ generating HelmRepository source
► applying HelmRepository source
✔ source created
◎ waiting for HelmRepository source reconciliation
✔ HelmRepository source reconciliation completed
✔ fetched revision: sha256:48513aa497c9bf46e3053d2aef7e4d184d6df2165389a6024b03f8565fd501e8
Events: <none>
[moonpie@lizard flux-config]$ flux create helmrelease traefik --chart traefik --source HelmRepository/traefik --chart-version 31.0.0 --verbose
✚ generating HelmRelease
► applying HelmRelease
✔ HelmRelease updated
◎ waiting for HelmRelease reconciliation ^C
Despite my impatience, it did render, and Traefik did deploy.
[moonpie@lizard flux-system]$ kubectl get pods -n flux-system
NAME READY STATUS RESTARTS AGE
helm-controller-76dff45854-g8tff 1/1 Running 0 3h4m
kustomize-controller-6bc5d5b96-sdzql 1/1 Running 0 3h4m
notification-controller-7f5cd7fdb8-v9672 1/1 Running 0 3h4m
source-controller-54c89dcbf6-kjjsb 1/1 Running 0 3h4m traefik-6f6c897d6-j7g8z 1/1 Running 0 9m34s
But… no changes were made to the git repo? I’m confused, as I thought the point of flux was that all changes would be version controlled.
I started to follow the flux troubleshooting guide.
[moonpie@lizard moonpiedumplings.github.io]$ kubectl describe helmrelease traefik -n flux-system
Name: traefik
Namespace: flux-system
...
...
Status:
Conditions:
Last Transition Time: 2024-09-17T20:16:28Z
Message: Failed to install after 1 attempt(s)
Observed Generation: 1
Reason: RetriesExceeded
Status: True
Type: Stalled
Last Transition Time: 2024-09-17T20:16:28Z
Message: Helm install failed for release flux-system/traefik with chart traefik@31.0.0: client rate limiter Wait returned an error: context deadline
...
...
2024-09-17T20:11:27.315654788Z: CustomResourceDefinition serverstransporttcps.traefik.io is already present. Skipping.
2024-09-17T20:11:27.315658603Z: creating 1 resource(s)
2024-09-17T20:11:27.324448937Z: CustomResourceDefinition tlsoptions.traefik.io is already present. Skipping.
2024-09-17T20:11:27.324452859Z: creating 1 resource(s)
2024-09-17T20:11:27.332385521Z: CustomResourceDefinition tlsstores.traefik.io is already present. Skipping.
2024-09-17T20:11:27.332389484Z: creating 1 resource(s)
2024-09-17T20:11:27.348449663Z: CustomResourceDefinition traefikservices.traefik.io is already present. Skipping.
2024-09-17T20:11:27.85810682Z: creating 6 resource(s)
2024-09-17T20:11:27.904853344Z: beginning wait for 6 resources with timeout of 5m0s
2024-09-17T20:11:27.909519301Z: Service does not have load balancer ingress IP address: flux-system/traefik 2024-09-17T20:16:25.90942954Z: : flux-system/traefik (148 duplicate lines omitted)
So it seems I’m missing a “load balancer ingress IP address”.
I also find it odd that the relevant configs are not automatically added to the git repo. I decided to experiment with adding files to the git repo, rather than using the flux cli.
[moonpie@lizard flux-config]$ ls traefik/ config.yaml source.yaml
[moonpie@lizard traefik]$ cat config.yaml
---
apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
name: traefik
namespace: flux-system
spec:
chart:
spec:
chart: traefik
reconcileStrategy: ChartVersion
sourceRef:
kind: HelmRepository
name: traefik
version: 31.0.0
interval: 1m0s
values:
service:
type: LoadBalancer
[moonpie@lizard traefik]$ cat source.yaml
---
apiVersion: source.toolkit.fluxcd.io/v1
kind: HelmRepository
metadata:
name: traefik
namespace: flux-system
spec:
interval: 1m0s url: https://traefik.github.io/charts/
This actually deploys properly without errors, so it seems that the flux git repo organization is arbitrary and for the purposes of maknig it readable for humans, and is not actually necessary.
It probably doesn’t error because I have the service: type: LoadBalancer
, but it doesn’t actually use any ports or seem to be deployed in a useful manner.
Also:
[moonpie@lizard traefik]$ kubectl get -n flux-system service
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
default kubernetes ClusterIP 10.43.0.1 <none> 443/TCP 6d20h
flux-system notification-controller ClusterIP 10.43.164.182 <none> 80/TCP 2d14h
flux-system source-controller ClusterIP 10.43.58.240 <none> 80/TCP 2d14h
flux-system traefik LoadBalancer 10.43.109.84 <pending> 80:30239/TCP,443:31645/TCP 18h flux-system webhook-receiver ClusterIP 10.43.226.89 <none> 80/TCP 2d14h
It seems that traefik comes with a load balancer type service, but it doesn’t actually get an external ip. Why?
After more research, it seems that a load balancer is actually needed in order to use a load balancer type service. On cloud instances, the cloud server acts as a load balancer and can do this. On bare metal instances, users are on their own. The most common deployment for a bare metal load balancer seems to be metallb, but that doesn’t really work for me, as I only have ONE ip address and it looks like metallb expects you to be able to ask for more on the spot somehow.
(well… technically, I can use dhcp to ask for more ip addresses, but it doesn’t look like metallb supports that use case, only something different)
Instead, I found klipper. Klipper is an extremely simple load balancing service built into K3s — but disabled in RKE2 by default. All it does is use iptables/nftables to forward traffic from ports you want the load balancer to forward from, to inside the cluster.
RKE2 can deply klipper by using an install flag… and it seems, only via an install flag. There doesn’t seem to be another way to install klipper, no helm charts or anything… it seems to just be a bash script in a docker container.
So, it looks like I need to uninstall my cluster and redeploy it…
curl -sfL https://get.rke2.io --output install.sh
chmod +x install.sh
./install.sh --serviceLB # with root privileges
But, after I reinstalled flux:
[moonpie@lizard traefik]$ flux get all -n flux-system
NAME REVISION SUSPENDED READY MESSAGE
gitrepository/flux-system main@sha1:819432fd False True stored artifact for revision 'main@sha1:819432fd'
NAME REVISION SUSPENDED READY MESSAGE
helmrepository/traefik sha256:1c0fc56c False True stored artifact: revision 'sha256:1c0fc56c'
NAME REVISION SUSPENDED READY MESSAGE
helmchart/flux-system-traefik 31.0.0 False True pulled 'traefik' chart with version '31.0.0'
NAME REVISION SUSPENDED READY MESSAGE
helmrelease/traefik 31.0.0 False False Helm install failed for release flux-system/traefik with chart traefik@31.0.0: Unable to continue with install: Service "traefik" in namespace "flux-system" exists and cannot be imported into the current release: invalid ownership metadata; label validation error: missing key "app.kubernetes.io/managed-by": must be set to "Helm"; annotation validation error: missing key "meta.helm.sh/release-name": must be set to "traefik"; annotation validation error: missing key "meta.helm.sh/release-namespace": must be set to "flux-system"
NAME REVISION SUSPENDED READY MESSAGE kustomization/flux-system main@sha1:819432fd False True Applied revision: main@sha1:819432fd
root@thoth:/usr/local/bin# ls
rke2 rke2-killall.sh rke2-uninstall.sh
root@thoth:/usr/local/bin# rke2
rke2 rke2-killall.sh rke2-uninstall.sh
root@thoth:/usr/local/bin# rke2
rke2 rke2-killall.sh rke2-uninstall.sh
root@thoth:/usr/local/bin# rke2 server --help
NAME:
rke2 server - Run management server
USAGE:
rke2 server [OPTIONS]
OPTIONS:
--config FILE, -c FILE (config) Load configuration from FILE (default: "/etc/rancher/rke2/config.yaml") [$RKE2_CONFIG_FILE]
--debug (logging) Turn on debug logs [$RKE2_DEBUG]
--bind-address value (listener) rke2 bind address (default: 0.0.0.0)
--advertise-address value (listener) IPv4/IPv6 address that apiserver uses to advertise to members of the cluster (default: node-external-ip/node-ip)
--tls-san value (listener) Add additional hostnames or IPv4/IPv6 addresses as Subject Alternative Names on the server TLS cert
--tls-san-security (listener) Protect the server TLS cert by refusing to add Subject Alternative Names not associated with the kubernetes apiserver service, s
--data-dir value, -d value (data) Folder to hold state (default: "/var/lib/rancher/rke2") [$RKE2_DATA_DIR]
--cluster-cidr value (networking) IPv4/IPv6 network CIDRs to use for pod IPs (default: 10.42.0.0/16)
--service-cidr value (networking) IPv4/IPv6 network CIDRs to use for service IPs (default: 10.43.0.0/16)
--service-node-port-range value (networking) Port range to reserve for services with NodePort visibility (default: "30000-32767")
--cluster-dns value (networking) IPv4 Cluster IP for coredns service. Should be in your service-cidr range (default: 10.43.0.10)
--cluster-domain value (networking) Cluster Domain (default: "cluster.local")
--egress-selector-mode value (networking) One of 'agent', 'cluster', 'pod', 'disabled' (default: "agent")
--servicelb-namespace value (networking) Namespace of the pods for the servicelb component (default: "kube-system")
--write-kubeconfig value, -o value (client) Write kubeconfig for admin client to this file [$RKE2_KUBECONFIG_OUTPUT]
--write-kubeconfig-mode value (client) Write kubeconfig with this mode [$RKE2_KUBECONFIG_MODE]
--write-kubeconfig-group value (client) Write kubeconfig with this group [$RKE2_KUBECONFIG_GROUP]
--helm-job-image value (helm) Default image to use for helm jobs
--token value, -t value (cluster) Shared secret used to join a server or agent to a cluster [$RKE2_TOKEN]
--token-file value (cluster) File containing the token [$RKE2_TOKEN_FILE]
--agent-token value (cluster) Shared secret used to join agents to the cluster, but not servers [$RKE2_AGENT_TOKEN]
--agent-token-file value (cluster) File containing the agent secret [$RKE2_AGENT_TOKEN_FILE]
--server value, -s value (cluster) Server to connect to, used to join a cluster [$RKE2_URL]
--cluster-reset (cluster) Forget all peers and become sole member of a new cluster [$RKE2_CLUSTER_RESET]
--cluster-reset-restore-path value (db) Path to snapshot file to be restored
--kube-apiserver-arg value (flags) Customized flag for kube-apiserver process
--etcd-arg value (flags) Customized flag for etcd process
--kube-controller-manager-arg value (flags) Customized flag for kube-controller-manager process
--kube-scheduler-arg value (flags) Customized flag for kube-scheduler process
--kube-cloud-controller-manager-arg value (flags) Customized flag for kube-cloud-controller-manager process
--datastore-endpoint value (db) Specify etcd, NATS, MySQL, Postgres, or SQLite (default) data source name [$RKE2_DATASTORE_ENDPOINT]
--datastore-cafile value (db) TLS Certificate Authority file used to secure datastore backend communication [$RKE2_DATASTORE_CAFILE]
--datastore-certfile value (db) TLS certification file used to secure datastore backend communication [$RKE2_DATASTORE_CERTFILE]
--datastore-keyfile value (db) TLS key file used to secure datastore backend communication [$RKE2_DATASTORE_KEYFILE]
--etcd-expose-metrics (db) Expose etcd metrics to client interface. (default: false)
--etcd-disable-snapshots (db) Disable automatic etcd snapshots
--etcd-snapshot-name value (db) Set the base name of etcd snapshots (default: etcd-snapshot-<unix-timestamp>) (default: "etcd-snapshot")
--etcd-snapshot-schedule-cron value (db) Snapshot interval time in cron spec. eg. every 5 hours '0 */5 * * *' (default: "0 */12 * * *")
--etcd-snapshot-retention value (db) Number of snapshots to retain (default: 5)
--etcd-snapshot-dir value (db) Directory to save db snapshots. (default: ${data-dir}/db/snapshots)
--etcd-snapshot-compress (db) Compress etcd snapshot
--etcd-s3 (db) Enable backup to S3
--etcd-s3-endpoint value (db) S3 endpoint url (default: "s3.amazonaws.com")
--etcd-s3-endpoint-ca value (db) S3 custom CA cert to connect to S3 endpoint
--etcd-s3-skip-ssl-verify (db) Disables S3 SSL certificate validation
--etcd-s3-access-key value (db) S3 access key [$AWS_ACCESS_KEY_ID]
--etcd-s3-secret-key value (db) S3 secret key [$AWS_SECRET_ACCESS_KEY]
--etcd-s3-bucket value (db) S3 bucket name
--etcd-s3-region value (db) S3 region / bucket location (optional) (default: "us-east-1")
--etcd-s3-folder value (db) S3 folder
--etcd-s3-proxy value (db) Proxy server to use when connecting to S3, overriding any proxy-releated environment variables
--etcd-s3-config-secret value (db) Name of secret in the kube-system namespace used to configure S3, if etcd-s3 is enabled and no other etcd-s3 options are set
--etcd-s3-insecure (db) Disables S3 over HTTPS
--etcd-s3-timeout value (db) S3 timeout (default: 5m0s)
--disable value (components) Do not deploy packaged components and delete any deployed components (valid items: rke2-coredns, rke2-metrics-server, rke2-sna
--disable-scheduler (components) Disable Kubernetes default scheduler
--disable-cloud-controller (components) Disable rke2 default cloud controller manager
--disable-kube-proxy (components) Disable running kube-proxy
--embedded-registry (experimental/components) Enable embedded distributed container registry; requires use of embedded containerd; when enabled agents will als
--supervisor-metrics (experimental/components) Enable serving rke2 internal metrics on the supervisor port; when enabled agents will also listen on the supervis
--node-name value (agent/node) Node name [$RKE2_NODE_NAME]
--with-node-id (agent/node) Append id to node name
--node-label value (agent/node) Registering and starting kubelet with set of labels
--node-taint value (agent/node) Registering kubelet with set of taints
--image-credential-provider-bin-dir value (agent/node) The path to the directory where credential provider plugin binaries are located (default: "/var/lib/rancher/credentialprovider
--image-credential-provider-config value (agent/node) The path to the credential provider plugin config file (default: "/var/lib/rancher/credentialprovider/config.yaml")
--container-runtime-endpoint value (agent/runtime) Disable embedded containerd and use the CRI socket at the given path; when used with --docker this sets the docker socket p
--default-runtime value (agent/runtime) Set the default runtime in containerd
--disable-default-registry-endpoint (agent/containerd) Disables containerd's fallback default registry endpoint when a mirror is configured for that registry
--snapshotter value (agent/runtime) Override default containerd snapshotter (default: "overlayfs")
--private-registry value (agent/runtime) Private registry configuration file (default: "/etc/rancher/rke2/registries.yaml")
--system-default-registry value (agent/runtime) Private registry to be used for all system images [$RKE2_SYSTEM_DEFAULT_REGISTRY]
--node-ip value, -i value (agent/networking) IPv4/IPv6 addresses to advertise for node
--node-external-ip value (agent/networking) IPv4/IPv6 external IP addresses to advertise for node
--resolv-conf value (agent/networking) Kubelet resolv.conf file [$RKE2_RESOLV_CONF]
--kubelet-arg value (agent/flags) Customized flag for kubelet process
--kube-proxy-arg value (agent/flags) Customized flag for kube-proxy process
--protect-kernel-defaults (agent/node) Kernel tuning behavior. If set, error if kernel tunables are different than kubelet defaults.
--enable-pprof (experimental) Enable pprof endpoint on supervisor port
--selinux (agent/node) Enable SELinux in containerd [$RKE2_SELINUX]
--lb-server-port value (agent/node) Local port for supervisor client load-balancer. If the supervisor and apiserver are not colocated an additional port 1 less than this port will also be used for the apiserver client load-balancer. (default: 6444) [$RKE2_LB_SERVER_PORT]
--cni value (networking) CNI Plugins to deploy, one of none, calico, canal, cilium, flannel; optionally with multus as the first value to enable the multus meta-plugin (default: canal) [$RKE2_CNI]
--ingress-controller value (networking) Ingress Controllers to deploy, one of none, ingress-nginx, traefik; the first value will be set as the default ingress class (default: ingress-nginx) [$RKE_INGRESS_CONTROLLER]
--enable-servicelb (components) Enable rke2 default cloud controller manager's service controller [$RKE2_ENABLE_SERVICELB]
--kube-apiserver-image value (image) Override image to use for kube-apiserver [$RKE2_KUBE_APISERVER_IMAGE]
--kube-controller-manager-image value (image) Override image to use for kube-controller-manager [$RKE2_KUBE_CONTROLLER_MANAGER_IMAGE]
--cloud-controller-manager-image value (image) Override image to use for cloud-controller-manager [$RKE2_CLOUD_CONTROLLER_MANAGER_IMAGE]
--kube-proxy-image value (image) Override image to use for kube-proxy [$RKE2_KUBE_PROXY_IMAGE]
--kube-scheduler-image value (image) Override image to use for kube-scheduler [$RKE2_KUBE_SCHEDULER_IMAGE]
--pause-image value (image) Override image to use for pause [$RKE2_PAUSE_IMAGE]
--runtime-image value (image) Override image to use for runtime binaries (containerd, kubectl, crictl, etc) [$RKE2_RUNTIME_IMAGE]
--etcd-image value (image) Override image to use for etcd [$RKE2_ETCD_IMAGE]
--kubelet-path value (experimental/agent) Override kubelet binary path [$RKE2_KUBELET_PATH]
--cloud-provider-name value (cloud provider) Cloud provider name [$RKE2_CLOUD_PROVIDER_NAME]
--cloud-provider-config value (cloud provider) Cloud provider configuration file path [$RKE2_CLOUD_PROVIDER_CONFIG]
--profile value (security) Validate system configuration against the selected benchmark (valid items: cis) [$RKE2_CIS_PROFILE]
--audit-policy-file value (security) Path to the file that defines the audit policy configuration [$RKE2_AUDIT_POLICY_FILE]
--pod-security-admission-config-file value (security) Path to the file that defines Pod Security Admission configuration [$RKE2_POD_SECURITY_ADMISSION_CONFIG_FILE]
--control-plane-resource-requests value (components) Control Plane resource requests [$RKE2_CONTROL_PLANE_RESOURCE_REQUESTS]
--control-plane-resource-limits value (components) Control Plane resource limits [$RKE2_CONTROL_PLANE_RESOURCE_LIMITS]
--control-plane-probe-configuration value (components) Control Plane Probe configuration [$RKE2_CONTROL_PLANE_PROBE_CONFIGURATION]
--kube-apiserver-extra-mount value (components) kube-apiserver extra volume mounts [$RKE2_KUBE_APISERVER_EXTRA_MOUNT]
--kube-scheduler-extra-mount value (components) kube-scheduler extra volume mounts [$RKE2_KUBE_SCHEDULER_EXTRA_MOUNT]
--kube-controller-manager-extra-mount value (components) kube-controller-manager extra volume mounts [$RKE2_KUBE_CONTROLLER_MANAGER_EXTRA_MOUNT]
--kube-proxy-extra-mount value (components) kube-proxy extra volume mounts [$RKE2_KUBE_PROXY_EXTRA_MOUNT]
--etcd-extra-mount value (components) etcd extra volume mounts [$RKE2_ETCD_EXTRA_MOUNT]
--cloud-controller-manager-extra-mount value (components) cloud-controller-manager extra volume mounts [$RKE2_CLOUD_CONTROLLER_MANAGER_EXTRA_MOUNT]
--kube-apiserver-extra-env value (components) kube-apiserver extra environment variables [$RKE2_KUBE_APISERVER_EXTRA_ENV]
--kube-scheduler-extra-env value (components) kube-scheduler extra environment variables [$RKE2_KUBE_SCHEDULER_EXTRA_ENV]
--kube-controller-manager-extra-env value (components) kube-controller-manager extra environment variables [$RKE2_KUBE_CONTROLLER_MANAGER_EXTRA_ENV]
--kube-proxy-extra-env value (components) kube-proxy extra environment variables [$RKE2_KUBE_PROXY_EXTRA_ENV]
--etcd-extra-env value (components) etcd extra environment variables [$RKE2_ETCD_EXTRA_ENV] --cloud-controller-manager-extra-env value (components) cloud-controller-manager extra environment variables [$RKE2_CLOUD_CONTROLLER_MANAGER_EXTRA_ENV]
So, it seems that the –servicelb system is enabled in the rke2 command line, rather than in the install script.
So, I used systemctl edit rke2-server.service
to create an override file for the systemd service.
### Editing /etc/systemd/system/rke2-server.service.d/override.conf
### Anything between here and the comment below will become the new contents of the file
[Service]
ExecStart=
ExecStart=/usr/local/bin/rke2 server --servicelb
### Lines below this comment will be discarded
### /usr/local/lib/systemd/system/rke2-server.service
# [Unit]
# Description=Rancher Kubernetes Engine v2 (server)
# Documentation=https://github.com/rancher/rke2#readme
# Wants=network-online.target
# After=network-online.target
# Conflicts=rke2-agent.service
#
# [Install]
# WantedBy=multi-user.target
#
# [Service]
# Type=notify
# EnvironmentFile=-/etc/default/%N
# EnvironmentFile=-/etc/sysconfig/%N # EnvironmentFile=-/usr/local/lib
And after restarting the kubernetes service, and reconciling, traefik deploys successfully!
[moonpie@lizard traefik]$ flux get all
NAME REVISION SUSPENDED READY MESSAGE
...
...
NAME REVISION SUSPENDED READY MESSAGE
helmrelease/traefik 31.0.0 False True Helm install succeeded for release flux-system/traefik.v1 with chart traefik@31.0.0
... ...
Except… not really. The port 80 and 443 are not used when I check sudo ss -tulpn
, and if I curl either of those ports, it just times out. A relevant github issue says that klipper has some problems with canal, and to enable canal ip forwarding in /etc/cni/net.d/10-canal.conflist
.
However, despite doing this, and restarting, the ports are not in use, and the service times out if I try to curl my site.
But, traefik seems to be working, because if I use the kubectl port-forward
command to port forward traefik to my local machine, I can curl it.
[moonpie@lizard traefik]$ kubectl port-forward -n flux-system pods/traefik-6f6c897d6-vr78w 8000
Forwarding from 127.0.0.1:8000 -> 8000
Forwarding from [::1]:8000 -> 8000
Handling connection for 8000
# In a different terminal
[moonpie@lizard traefik]$ curl localhost:8000 404 page not found
But…
[moonpie@lizard traefik]$ kubectl describe -n flux-system svc/traefik
Name: traefik
Namespace: flux-system
Labels: app.kubernetes.io/instance=traefik-flux-system
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=traefik
helm.sh/chart=traefik-31.0.0
helm.toolkit.fluxcd.io/name=traefik
helm.toolkit.fluxcd.io/namespace=flux-system
Annotations: meta.helm.sh/release-name: traefik
meta.helm.sh/release-namespace: flux-system
Selector: app.kubernetes.io/instance=traefik-flux-system,app.kubernetes.io/name=traefik
Type: LoadBalancer
IP Family Policy: SingleStack
IP Families: IPv4
IP: 10.43.249.0
IPs: 10.43.249.0
LoadBalancer Ingress: 130.166.90.189
Port: web 80/TCP
TargetPort: 8000/TCP
NodePort: web 30801/TCP
Endpoints: 10.42.0.15:8000
Port: websecure 443/TCP
TargetPort: websecure/TCP
NodePort: websecure 32357/TCP
Endpoints: 10.42.0.15:8443
Session Affinity: None External Traffic Policy: Cluster
root@thoth:~# curl 10.42.0.15:8000 404 page not found
So I’m guessing that this is supposed to be forwarded to port 80 on the host, but it isn’t.
[moonpie@lizard traefik]$ kubectl get events --sort-by='.lastTimestamp' -A
NAMESPACE LAST SEEN TYPE REASON OBJECT MESSAGE
...
...
kube-system 9m43s Warning FailedMount pod/helm-install-rke2-canal-jjmwv MountVolume.SetUp failed for volume "content" : object "kube-system"/"chart-content-rke2-canal" not registered
kube-system 5m39s Warning FailedMount pod/helm-install-rke2-canal-jjmwv MountVolume.SetUp failed for volume "values" : object "kube-system"/"chart-values-rke2-canal" not registered
... ...
Although an interesting error, it’s possible that this isn’t the error that is irritating me. According to a github issue comment, RKE2 comes wtih “network policies” that can restrict traffic between pods, and that might be the issue.
However, on according to the fluxcd documentation on networkpolicies, fluxcd comes with rules that disallow traffic to the flux-system namepsace by default… which is where I was attempting to deploy my software.
So, after I change the configs to deploy traefik to the kubernetes default
namespace:
[moonpie@lizard traefik]$ curl moonpiedumpl.ing 404 page not found
It works!
Now, I need to find a test service to deploy. I deployed podinfo, because it is a simple web service, used as an example service for fluxcd.
Here was the ingress file that I used to expose it on <podinfo.moonpiedumpl.ing>
podinfo-ingress.yaml
[moonpie@lizard podinfo]$ cat podinfo-ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: podinfo
annotations:
traefik.ingress.kubernetes.io/router.entrypoints: web
namespace: default
spec:
rules:
- host: podinfo.moonpiedumpl.ing
http:
paths:
- path: /
pathType: Exact
backend:
service:
name: podinfo
port:
number: 9898
Not really simple, and it was a pain to edit because my editor (Kate), set tabs to 4 spaces, rather than two.
Now, how can I add HTTPS/TLS to this? Although Traefik documents how to enable acme in the first example, however, the documentation on how to add this to each individual site is unclear.
Based on removing the annotation, and testing, it looks like the annoation isn’t needed. Also, it seems that the kubectl explain ingress --recursive
explains the “ingress” kubernetes resource, which is above.
However, it seems that Traefik provides an “IngressRoute” resource, which is what they expect you to use for automatic https setups like what I am trying to do. But… I’m hesitant to rely on that, as “IngressRoute” seems to be traefik specific, rather than Ingress, which is general to kubernetes.
Actually, after doing more research, I’ve decided to switch to ingress-nginx. Unlike traefik, it seems to have build in support for external oauth authentication. Although Traefik can do it, it’s not build in, and I would have to use a plugin.
So, this means I’m switching from Traefik to nginx and cert-manager instead.
Nginx/Cert-Manager
I followed the official cert-manager installation guide, but converted it into flux configs.
cert-manager/helmsource.yaml
---
apiVersion: source.toolkit.fluxcd.io/v1
kind: HelmRepository
metadata:
name: cert-manager
namespace: default
spec:
interval: 1m0s
url: https://charts.jetstack.io
cert-manager/helmrelease.yaml
---
apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
name: cert-manager
namespace: default
spec:
chart:
spec:
chart: cert-manager
reconcileStrategy: ChartVersion
sourceRef:
kind: HelmRepository
name: cert-manager
version: v1.16.0
interval: 1m0s
values:
crds:
enabled: true
One thing that I had to change is the values at the very bottom. It seems that the --set crds.enabled=true
in the helm install command options doesn’t work for flux. Instead, I had to seperate it out to what is in the values
section above.
I also deployed ingress-nginx:
helmsource.yaml
---
apiVersion: source.toolkit.fluxcd.io/v1
kind: HelmRepository
metadata:
name: ingress-nginx
namespace: default
spec:
interval: 1m0s
url: https://kubernetes.github.io/ingress-nginx
helmrelease.yaml
---
apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
name: ingress-nginx
namespace: default
spec:
chart:
spec:
chart: ingress-nginx
reconcileStrategy: ChartVersion
sourceRef:
kind: HelmRepository
name: ingress-nginx
# version:
interval: 1m0s
Another thing I had to do was to uninstall traefik:
[moonpie@lizard flux-config]$ kubectl delete helm
helmchartconfigs.helm.cattle.io helmcharts.source.toolkit.fluxcd.io helmrepositories.source.toolkit.fluxcd.io
helmcharts.helm.cattle.io helmreleases.helm.toolkit.fluxcd.io
[moonpie@lizard flux-config]$ kubectl delete helmrepositories.source.toolkit.fluxcd.io traefik helmrepository.source.toolkit.fluxcd.io "traefik" deleted
For whatever reason, after deleting the traefik files from my git repo, they did not get removed from flux even after reconciling them. But, after this, the install works normally.
However, when I attempt to appy an ingress, I get an error:
Warning: resource ingresses/podinfo is missing the kubectl.kubernetes.io/last-applied-configuration annotation which is required by kubectl apply. kubectl apply should only be used on resources created declaratively by either kubectl create --save-config or kubectl apply. The missing annotation will be patched automatically.
Error from server (InternalError): error when applying patch:
{"metadata":{"annotations":{"kubectl.kubernetes.io/last-applied-configuration":"{\"apiVersion\":\"networking.k8s.io/v1\",\"kind\":\"Ingress\",\"metadata\":{\"annotations\":{},\"name\":\"podinfo\",\"namespace\":\"default\"},\"spec\":{\"rules\":[{\"host\":\"podinfo.moonpiedumpl.ing\",\"http\":{\"paths\":[{\"backend\":{\"service\":{\"name\":\"podinfo\",\"port\":{\"number\":9898}}},\"path\":\"/\",\"pathType\":\"Exact\"}]}}]}}\n"}}}
to:
Resource: "networking.k8s.io/v1, Resource=ingresses", GroupVersionKind: "networking.k8s.io/v1, Kind=Ingress"
Name: "podinfo", Namespace: "default" for: "podinfo-ingress.yaml": error when patching "podinfo-ingress.yaml": Internal error occurred: failed calling webhook "validate.nginx.ingress.kubernetes.io": failed to call webhook: Post "https://ingress-nginx-controller-admission.default.svc:443/networking/v1/ingresses?timeout=10s": tls: failed to verify certificate: x509: certificate signed by unknown authority
This seems to be a sort of race condition, caused by when resources are simautaneously brought up, despite one depending on another. I found dsome relevant github issues.
https://github.com/kubernetes/ingress-nginx/issues/5968
There were some hacks related to deleting the hook, but I found in the helm chart documentation, there is an official option to delete the hook. I set that:
helmrelease.yaml
---
apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
name: ingress-nginx
namespace: default
spec:
chart:
spec:
chart: ingress-nginx
reconcileStrategy: ChartVersion
sourceRef:
kind: HelmRepository
name: ingress-nginx
# version:
interval: 1m0s
values:
controller:
admissionWebhooks:
enabled: false
Now:
[moonpie@lizard podinfo]$ kubectl apply -f podinfo-ingress.yaml ingress.networking.k8s.io/podinfo created
But:
[moonpie@lizard podinfo]$ curl podinfo.moonpiedumpl.ing
<html>
<head><title>404 Not Found</title></head>
<body>
<center><h1>404 Not Found</h1></center>
<hr><center>nginx</center>
</body> </html>
I figured out why, somewhat. For some reason, it was using the traefik ingress class, despite the fact that I uninstalled traefik, and was using nginx. However, even after:
[moonpie@lizard podinfo]$ kubectl get ingressclasses.networking.k8s.io
NAME CONTROLLER PARAMETERS AGE
nginx k8s.io/ingress-nginx <none> 6h29m
traefik traefik.io/ingress-controller <none> 69m
[moonpie@lizard podinfo]$ kubectl delete ingressclasses.networking.k8s.io traefik ingressclass.networking.k8s.io "traefik" deleted
It still didn’t work. Also, traefik seems to have a lot of stuff still running:
[moonpie@lizard podinfo]$ kubectl get all -A | grep traefik
default pod/traefik-66cc8b6ff6-64zll 1/1 Running 0 73m
kube-system pod/svclb-traefik-13906a53-vmt4g 0/2 Pending 0 73m
default service/traefik LoadBalancer 10.43.207.1 <pending> 80:32766/TCP,443:30881/TCP 73m
kube-system daemonset.apps/svclb-traefik-13906a53 1 1 0 1 0 <none> 73m
default deployment.apps/traefik 1/1 1 1 73m default replicaset.apps/traefik-66cc8b6ff6 1 1 1 73m
Woops I forgot to push my changes. Nevermind. So I did, and now it doesn’t work again, and it’s back to a 404.
I think I figured it out:
NAME CONTROLLER PARAMETERS AGE
nginx k8s.io/ingress-nginx <none> 32h
[moonpie@lizard podinfo]$ kubectl describe ingress podinfo
Name: podinfo
Labels: kustomize.toolkit.fluxcd.io/name=flux-system
kustomize.toolkit.fluxcd.io/namespace=flux-system
Namespace: default
Address:
Ingress Class: <none>
Default backend: <default>
Rules:
Host Path Backends
---- ---- --------
podinfo.moonpiedumpl.ing
/ podinfo:9898 (10.42.0.40:9898,10.42.0.41:9898)
Annotations: <none> Events: <none>
The “Ingress Class” is empty, when it probably needs to be filled with something. There are two solutions: I can set it manually, the exact field is ingress.spec.ingressClassName, or I can set it an an ingressclass as a default.
I edited the nginx helm release with more configuration:
[moonpie@lizard flux-config]$ cat nginx/helmrelease.yaml
---
apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
name: ingress-nginx
namespace: default
spec:
chart:
spec:
chart: ingress-nginx
reconcileStrategy: ChartVersion
sourceRef:
kind: HelmRepository
name: ingress-nginx
# version:
interval: 1m0s
values:
controller:
admissionWebhooks:
enabled: false
ingressClassResource: default: true
And after this, I had to delete the podinfo ingress, and then recreate it, but it was working again. I wonder why it didn’t change the ingressclass when I reapplied the yaml file?
Now for TLS/HTTPS.
Well, TLS already kinda works. It’s just using kubernetes self signed cert, rather than a letsencrypt cert.
Here is the doumentation on using cert-manager and nginx together. They recommend using the http01 (archive) challenge, but that method (or maybe just their method) does not work with wildcard domains.
It is not possible to obtain certificates for wildcard domain names (e.g.
*.example.com
) using the HTTP01 challenge mechanism.
From kubectl explain issuer.spec.acme.solvers.http01
.
THe other thing I don’t like about that page, is that it suggests that to set the “ingressClassName”, but I don’t want to do that. What if I want to change ingresses later on, would I have to change every single issuer? I think I will just allow it to set it’s own default and hope for the best.
According the cert-manager docs for acme http01
If class and ingressClassName are not specified, and name is also not specified, cert-manager will default to create new Ingress resources but will not set the ingress class on these resources, meaning all ingress controllers installed in your cluster will serve traffic for the challenge solver, potentially incurring additional cost.
I should be able to not set this field. I played around a bit with leaving the fields blank, but it didn’t work. I had to actually create the field, and leave it blank.
issuer-staging.yaml
apiVersion: cert-manager.io/v1
kind: Issuer
metadata:
name: letsencrypt-staging
spec:
acme:
# The ACME server URL
server: https://acme-staging-v02.api.letsencrypt.org/directory
# Email address used for ACME registration
email: email@example.com
# Name of a secret used to store the ACME account private key
privateKeySecretRef:
name: letsencrypt-staging
# Enable the HTTP-01 challenge provider
solvers:
- http01:
ingress:
ingressClassName:
This of course, has the downside that it will be used on all ingresses, but I should be able to get around this with the http01-edit-in-place: "true"
annotation.
Finally, I think I have TLS working properly:
podinfo-ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: podinfo
annotations:
cert-manager.io/issuer: "letsencrypt-staging"
acme.cert-manager.io/http01-edit-in-place: "true"
namespace: default
spec:
tls:
- hosts:
- "podinfo.moonpiedumpl.ing"
secretName: podinfo-tls
rules:
- host: podinfo.moonpiedumpl.ing
http:
paths:
- path: /
pathType: Exact
backend:
service:
name: podinfo
port:
number: 9898
And this works! Except NOT!. It doesn’t errors, and I instead need to have my issuer be:
apiVersion: cert-manager.io/v1
kind: Issuer
metadata:
name: letsencrypt-staging
namespace: default
spec:
acme:
# The ACME server URL
server: https://acme-staging-v02.api.letsencrypt.org/directory
# Email address used for ACME registration
email: moonpiedumplings2@gmail.com
# Name of a secret used to store the ACME account private key
privateKeySecretRef:
name: letsencrypt-staging
# Enable the HTTP-01 challenge provider
solvers:
- http01:
ingress: {}
Authentik
Volumes
I attempted to deploy authentik without secrets. However, it crashes:
Events:
Type Reason Age From Message
---- ------ ---- ---- ------- Warning FailedScheduling 46s (x4 over 15m) default-scheduler 0/1 nodes are available: pod has unbound immediate PersistentVolumeClaims. preemption: 0/1 nodes are available: 1 Preemption is not helpful for scheduling.
[moonpie@lizard cert-manager]$ kubectl get -A persistentvolumeclaims
NAMESPACE NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS VOLUMEATTRIBUTESCLASS AGE
default data-authentik-postgresql-0 Pending <unset> 39m default redis-data-authentik-redis-master-0 Pending <unset> 39m
So, I need to create a persistent volume of some kind, and then have it specifically reference the persistent volume claims that are used.
But… which provider do I use. Ideally, I want something similar to docker/podman volumes, where I don’t have to deal with mapping them to exact host paths. I also want these persistent volume claims to be automatically met, that is, dynamically provisioned storage.
I decided to use [openebs] for this.
Here are my flux configs:
---
apiVersion: source.toolkit.fluxcd.io/v1
kind: HelmRepository
metadata:
name: openebs
namespace: default
spec:
interval: 1m0s
url: https://openebs.github.io/charts
---
apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
name: openebs
namespace: default
spec:
chart:
spec:
chart: openebs
reconcileStrategy: ChartVersion
sourceRef:
kind: HelmRepository
name: openebs
# version:
interval: 1m0s
I need to a set a version, but this deploys openebs for now. However, it doesn’t instantly work, because the openebs-hostpath provisioner is not set to the default for storage classes. But when I do, using kubectl edit storageclass openebs-hostpath
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
annotations:
cas.openebs.io/config: |
- name: StorageType
value: "hostpath"
- name: BasePath
value: "/var/openebs/local" meta.helm.sh/release-name: openebs
meta.helm.sh/release-namespace: default
openebs.io/cas-type: local
storageclass.kubernetes.io/is-default-class: "true"
creationTimestamp: "2024-10-22T02:33:23Z"
labels:
app.kubernetes.io/managed-by: Helm
helm.toolkit.fluxcd.io/name: openebs
helm.toolkit.fluxcd.io/namespace: default
name: openebs-hostpath
resourceVersion: "10886300"
uid: 0bfb540c-e5e6-4f4c-bf7b-cd232630742c
provisioner: openebs.io/local
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer
I appended: storageclass.kubernetes.io/is-default-class: "true"
to the annotations section, and then PersistentVolumes are automatically created to satisfy the needs of authentik:
[moonpie@lizard flux-config]$ kubectl get persistentvolumes
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS VOLUMEATTRIBUTESCLASS REASON AGE
pvc-e179a2b6-0fa2-4fd8-9cfd-07c152b10bbe 8Gi RWO Delete Bound default/redis-data-authentik-redis-master-0 openebs-hostpath <unset> 9m35s pvc-ed21ace0-2580-4ea1-aaeb-6b24b04bb55e 8Gi RWO Delete Bound default/data-authentik-postgresql-0 openebs-hostpath <unset> 9m35s
After this, the Authentik server is up. But, it looks like a reclaim policy of “delete”, means that if I delete the Authentik helm chart, my user data will be deleted as well.
Instead, I think I need to manually create volumes in my flux config, that use “claimRef” to get claimed by the right PersistentVolumeClaims. Or maybe I can have openebs volumes retain themselves?
Actually, I think I need to create another storageclass that can dynamically provision volumes, but this one has the settings and specs I want, and is the default.
So, I did that:
openebs/localpath.yaml
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: openebs-persistent-hostpath
namespace: default
annotations:
openebs.io/cas-type: local
cas.openebs.io/config: |
- name: StorageType
value: hostpath
- name: BasePath
value: /var/openebs/persistent/ storageclass.kubernetes.io/is-default-class: "true"
provisioner: openebs.io/local
reclaimPolicy: Retain
volumeBindingMode: WaitForFirstConsumer
And with this, I should get persistent data. Now, I should get the ingress working.
Ingress
---
apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
name: authentik
namespace: default
spec:
chart:
spec:
chart: authentik
reconcileStrategy: ChartVersion
sourceRef:
kind: HelmRepository
name: authentik
# Figure out what version I should have
# version:
interval: 1m0s
values:
authentik:
secret_key: "PleaseGenerateASecureKey"
# This sends anonymous usage-data, stack traces on errors and
# performance data to sentry.io, and is fully opt-in
error_reporting:
enabled: true
postgresql:
password: "ThisIsNotASecurePassword"
server:
ingress:
ingressClassName: nginx
# Change to true when done
enabled: true
hosts:
- sso.moonpiedumpl.ing
annotations:
cert-manager.io/issuer: "letsencrypt-staging"
acme.cert-manager.io/http01-edit-in-place: "true"
tls:
- hosts: [sso.moonpiedumpl.ing]
secretName: sso-acme
postgresql:
enabled: true
auth:
password: "ThisIsNotASecurePassword"
redis:
enabled: true
And this works, including getting an SSL certificate from the letsencrypt-staging server.
Now I need to set up encryption of my secrets, because I want for this git repo to be public.
Secrets/SOPS
So, authentik looks difficult, because the helm chart requires quite a complex set of configurations, and I don’t want to put those in the same file for the helmrelease.
I think what I want to do is this: I should seperate out the authentik configs into a “ConfigMap”, which gets fed to the helm chart, and that will be one portion of the authentik config. The other portion will be a “Secret” which gets decrypted by fluxcd and sops.
- Fluxcd docs on sops (I will be using age, since that’s what the sops docs recomment)
- FLuxcd docs on configmap and secret references
I started by following the sops-age guide.
So, here is the problem I am encountering. I want to encrypt only the relevant values, but a configmap or secret stores the entire secret means that I have to encrypt an entire set of data values.
I was hoping for something like this:
helmrelease.yaml
---
apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
name: authentik
namespace: default
spec:
chart:
spec:
chart: authentik
reconcileStrategy: ChartVersion
sourceRef:
kind: HelmRepository
name: authentik
# Figure out what version I should have
# version:
interval: 1m0s
values:
authentik:
secret_key: "PleaseGenerateASecureKey"
# This sends anonymous usage-data, stack traces on errors and
# performance data to sentry.io, and is fully opt-in
error_reporting:
enabled: true
postgresql:
password: "ThisIsNotASecurePassword"
server:
ingress:
ingressClassName: nginx
# Change to true when done
enabled: false
hosts:
- authentik.moonpiedumpl.ing
postgresql:
enabled: true
auth:
password: "ThisIsNotASecurePassword"
redis: enabled: true
And then, I need to encrypt this file.
Before I do so, I generate secure passphrases using genpass, which is available in nixpkgs.
sops --age=age1sg3u7ndj045gzv3u4w5t5kntplg6sz2hv6k3uxpxq85vtx56rc4s8q83gr \ --encrypt --encrypted-regex '^(secret_key|password)$'
I then pipe this out to a new file.
apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
name: authentik
namespace: default
spec:
chart:
spec:
chart: authentik
reconcileStrategy: ChartVersion
sourceRef:
kind: HelmRepository
name: authentik
# Figure out what version I should have
# version:
interval: 1m0s
values:
authentik:
secret_key: ENC[AES256_GCM,data:igq7ugHC00ZPZ/0jFjAr/phAeCvhKn0J/avbNe01207L9Q==,iv:1NZuqFBFhc9vXQvTBw8nG20ZpZ6sUUxXrQRUw+KZ4yM=,tag:1QQUWHuiEavXfDi5lhQBFg==,type:str]
# This sends anonymous usage-data, stack traces on errors and
# performance data to sentry.io, and is fully opt-in
error_reporting:
enabled: true
postgresql:
password: ENC[AES256_GCM,data:9yNwAnLg62WePY0yiNBty+ii0CFOm+iSC6GI4ZzAgmGJ4Q==,iv:EsRZzPm8bZHycrhK2ZFPv2fp863pnwy2rGINXiyvCIk=,tag:PObidHEMUMFAmO0K50Nvqg==,type:str]
server:
ingress:
# Should default to nginx already
# ingressClassName: nginx
# Change to true when done
enabled: true
hosts:
- sso.moonpiedumpl.ing
annotations:
cert-manager.io/issuer: letsencrypt-staging
acme.cert-manager.io/http01-edit-in-place: "true"
tls:
- hosts:
- sso.moonpiedumpl.ing
secretName: sso-acme
postgresql:
enabled: true
auth:
password: ENC[AES256_GCM,data:QZgcD4a+wqktN7c9mmHWicFjTDm8ZDdDx41LdMEBEeABjQ==,iv:63TiPtJYywkIZwldp4PQcU3WzHKAYRGwqo/JtwE3eb8=,tag:sAE1dlWH11LFwH7/Fbk0Iw==,type:str]
redis:
enabled: true
sops:
kms: []
gcp_kms: []
azure_kv: []
hc_vault: []
age:
- recipient: age1sg3u7ndj045gzv3u4w5t5kntplg6sz2hv6k3uxpxq85vtx56rc4s8q83gr
enc: |
-----BEGIN AGE ENCRYPTED FILE-----
YWdlLWVuY3J5cHRpb24ub3JnL3YxCi0+IFgyNTUxOSBNdXVSa0FtTVFNV01iUndk
RklNNk02OFVxVVQ3RUZyUnFvTWFaUHpVNlhRCkdSUUpVdnBBU3JiSndZeFpPVlFx
OXpLcmhYSjJ1czI4QzFCZFdrcis3QWMKLS0tIFB5ZlQ4TytpNkEyUFJQUWs4VE4w
dzkzS0tMbjFzRitKbkpKVFg0a0owOXcKvSBlZmn4pocBHrc5QbUNA5W3p5kiRaYM
08eMw2rCn5f6hvB2uEoImiSaKQjThmgWLcCRL4kOB+itrto6b4wC0A==
-----END AGE ENCRYPTED FILE-----
lastmodified: "2024-11-20T06:54:32Z"
mac: ENC[AES256_GCM,data:BsAJJcQe2W0OObtlrJLmZZ1cgExRJ8Qpm/2p2oUXOQAHkK1K6Jc1ZeHTo7sTZaSQVxcMUFkdA/s9eRsTQ8dUcsI/rLLbzQXqlKjmre/ZDhjcNoevr2X4GTacso3koIcCrkdnO1X0mZC1q9C6myv5BQ4KbjCDjCO50FIrrFsoMW0=,iv:roIs3cREdk2tVX/Fatm1AC7JfyrSRORlmYM5muceQP4=,tag:pk+3z0ztaN+cHzd3cCQUMA==,type:str]
pgp: []
encrypted_regex: ^(secret_key|password)$ version: 3.9.1
Although this looks like the best setup, I don’t know if it works, maybe only kubernetes secrets can be decrypted?
Yup, this indeed fails:
[moonpie@cachyos-x8664 authentik]$ flux get all
NAME REVISION SUSPENDED READY MESSAGE
gitrepository/flux-system main@sha1:bb1f1636 False True stored artifact for revision 'main@sha1:bb1f1636'
NAME REVISION SUSPENDED READY MESSAGE
kustomization/flux-system main@sha1:ca8f3c07 False False HelmRelease/default/authentik dry-run failed: failed to create typed patch object (default/authentik; helm.toolkit.fluxcd.io/v2, Kind=HelmRelease): .sops: field not declared in schema
I think I need to create a “kustomization” in order to automatically decrypt secrets.
---
apiVersion: kustomize.toolkit.fluxcd.io/v1
kind: Kustomization
metadata:
name: my-secrets
namespace: flux-system
spec:
decryption:
provider: sops
secretRef:
name: sops-age
interval: 10m0s
path: ./
prune: true
sourceRef:
kind: "GitRepository"
name: "flux-system"
But, even after applying this, it still doesn’t work.
One relevant blog post suggests making a similar edit… to the autogenerated flux yaml files.
Another blog post creates a custom argocd dockerfile…
According to a GitHub issue, Flux does not, and will not support sops encryption of files directly. In fact, the user there claims that only secrets support decryption.
I think I need to separate out my sensitive data in to a secret, and then do that.
Static Site
I want to move my blog over to my own domain name. I think the easiest way to do this is. When quarto, the static site I use “publishes”, what it actually does is push a copy of the compiled static site to another git branch, gh-pages. Github automatically reads from that branch and serves the site at *.github.io
domains.
I think I can do something similar with:
- Flux git source that automatically pulls from that same branch every so often
- Web server that mounts the git repo as a volume and serves from it.
There is also this forgejo static site server, but I’m not going to be looking at that for now.
I also am searching for some static site premade deployment that can pull from a helm chart.
- Gimlet — this looks really complex, but looks like it does what I want.
- some nginx + curl solution, https://github.com/redhat-cop/helm-charts/tree/main/charts/static-site, artifacthub — I really like this solution, but it doesn’t seem to be maintained. Artifiacthub reports security vulnerabilities in the older containers used.
Ah, I think I found something that works best: https://artifacthub.io/packages/helm/bitnami/nginx
It took some effort to figure this one out.
---
apiVersion: source.toolkit.fluxcd.io/v1
kind: HelmRepository
metadata:
name: nginx
namespace: default
spec:
type: oci
interval: 5m0s
url: oci://docker.io/bitnamicharts
So apparently, when using OCI helm charts, the organization acts as the “repo”, where you get helm charts from.
---
apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
name: nginx
namespace: default
spec:
chart:
spec:
chart: nginx
reconcileStrategy: ChartVersion
sourceRef:
kind: HelmRepository
name: nginx
version: "18.2.4"
interval: 1m0s
values:
cloneStaticSiteFromGit:
enabled: true
repository: https://github.com/moonpiedumplings/moonpiedumplings.github.io
branch: gh-pages
# 60 seconds is the default
interval: 60
ingress:
enabled: true
hostname: moonpiedumpl.ing
annotations:
cert-manager.io/issuer: "letsencrypt-staging"
acme.cert-manager.io/http01-edit-in-place: "true"
tls:
- hosts: [moonpiedumpl.ing]
secretName: blog-acme
And this works, although I’m unhappy that I can’t use a “latest” tag to automatically update to the latest version of the chart.
Also, the ingress created does not seem to have ssl set up properly. It has cert-manager’s local source, rather than the letsencrypt-staging for a source. But, the site is up on <moonpiedumpl.ing>. This is a lot simpler than some custom CI solution, and I suspect I can simply disable Github’s automatic static site generation from the branch, while still having quarto render my static site to the gh-pages
branch.
This is probably because authentik and the bitnami helm nginx chart use differing values for their ingress configuration.
Forgejo
Forgejo has a helm chart
I don’t think I will be doing rootless, although that was my original plan.
Misc Notes for later on:
- https://github.com/stakater/Reloader — reload services in kubernetes when configmap or secrets change