The objective of this engineering initiative is to transition the Squirrelworks lab ecosystem away from legacy virtualization models and establish a hardened, production-grade Multi-Node Kubernetes cluster. By leveraging the security and efficiency profiles of enterprise-tier Open Source utilities, this bare-metal platform is built to deliver highly stable, self-healing runtime behaviors under rigorous bare-metal constraints.
Deploying the Rancher Kubernetes Engine (RKE2) on minimal Rocky Linux 9 nodes to enforce absolute process-level isolation and maximize CPU/RAM efficiency metrics.
Bypassing legacy Linux iptables entirely by injecting custom eBPF microcode bytecode straight into the kernel via Cilium, establishing raw hardware-speed packet delivery.
This cluster setup functions as a technical playbook for predictable, scalable operations. Eliminating intermediate application abstractions and decoupling the underlying container runtime networking guarantees deterministic performance behaviors across the entire compute host layer.
To establish a resilient cloud-native control plane, the master node was deployed on a minimal Rocky Linux 9 virtual instance within Proxmox. Standard network plumbing was locked down via SSH key integration, providing a stable foundation for the enterprise-grade Rancher Kubernetes Engine (RKE2).
During initial orchestration, a custom manifest was staged at /etc/rancher/rke2/config.yaml to inject the directive cni: none. This intentional override completely bypassed the default Canal network plugin, preventing the cluster API from establishing standard routing mechanisms and placing the control plane into a deliberate network holding pattern.
By running kubectl get nodes post-initialization, the node safely defaults to a NotReady state. This is not an error—it is an expected architectural behavior proving the API server is up, running, and securely waiting for an advanced eBPF network fabric to claim its interface.
Expanding the multi-node infrastructure required staging the secondary machine, rocky-worker-01. By linking the cryptographic joint token from the primary supervisor log into the worker's configuration path at 192.168.0.198, an ironclad hardware-to-host handshake was verified.
Following a system-level interruption right after package deployment, a forensic review showed that while the rke2-agent.service was enabled, it had never actively executed. Resolving this discrepancy involved executing a live daemon state change on the worker host CLI:
sudo systemctl start rke2-agent.service
The node cleanly entered the runtime cluster matrix, aligning its age telemetry in the control plane database without throwing a single TLS handshake anomaly.
Because RKE2 strips internal runtime folders down to raw core mechanics, an investigation of the internal path structure revealed that standard utilities like helm were excluded from the binary profile layer. Attempts to establish quick symlinks failed due to the nonexistent target paths within the core orchestration directories.
To bypass the minimal OS constraints of Rocky Linux without cluttering environment path files, the pure vanilla upstream binary for Helm v3 was downloaded. Because the minimal OS profile lacked native archive decompression capabilities, the package manager was used to inject the toolset required to place and flag the execution rights:
sudo dnf install -y tar && tar -zxvf helm.tar.gz
| Lab Component | Allocated Network / Storage Specification |
|---|---|
| rocky-control-01 | 192.168.0.197 | Primary Control Plane Listener |
| rocky-worker-01 | 192.168.0.198 | Compute Host Agent Client |
| Package Tooling | Helm v3 Stable Binary | Exoclipped to /usr/local/bin |
With the physical binary safely bound into /usr/local/bin/helm, index tables updated flawlessly. Injecting the official Cilium stable repository securely hooks our local system straight into the eBPF staging yard....
Initial deployment logs indicated the Cilium initialization pods were locked in a persistent PodInitializing block, throwing an Init:CrashLoopBackOff flag. A deep forensic review via kubectl describe pod revealed an architectural endpoint conflict: Cilium was attempting to map the global cluster topology through the RKE2 Supervisor Port (9345) instead of the standard Kubernetes API Server Port (6443).
The supervisor port on 9345 handles internal node registration and certificate synchronization. Passing API requests to this daemon causes an immediate failure return: "the server could not find the requested resource (get namespaces kube-system)".
To redirect the eBPF control hooks onto the valid API endpoint registry, a hot Helm database upgrade was fired down on the control plane terminal, hard-coding the data pipeline straight to the true API gateway:
sudo /usr/local/bin/helm upgrade cilium cilium/cilium --version 1.15.4 \
--namespace kube-system \
--set operator.replicas=1 \
--set kubeProxyReplacement=true \
--set k8sServiceHost=192.168.0.197 \
--set k8sServicePort=6443 \
--kubeconfig /etc/rancher/rke2/rke2.yaml
Helm processed the parameter updates cleanly, outputting a successful REVISION: 2 tracking manifest to the cluster backend.
The immediate activation of native kubeProxyReplacement=true triggered a critical network outage, severing management access to the Proxmox hypervisor IP (192.168.0.105). Cilium’s high-performance eBPF engine attempted to hook directly into the virtual interface drivers to establish direct packet routing. However, this action clashed head-on with the active hypervisor-level Proxmox Hardware Firewall on the VM network devices.
The simultaneously active packet-filtering engines generated an intense Layer 2 MAC flapping loop/ARP storm, completely saturating the Proxmox virtual bridge (vmbr0) and dropping all external management traffic.
To break the data loop without risking a host kernel crash, access was established via the hardware terminal console of the physical Proxmox node, forcing an immediate teardown of the cluster virtual machines and a flush of the network interfaces:
# Local Hypervisor Emergency Restoration Terminal
qm stop 107 && qm stop 108
systemctl restart networking
To guarantee clean execution paths for the cluster datapath going forward, the Proxmox virtual hardware firewall was permanently un-checked across both rocky-control-01 and rocky-worker-01 interface properties. This architecture safely hands over full Layer 2–4 network policy management natively to Cilium's internal security engines.
| Interface Target | Hypervisor Firewall State | CNI Routing Strategy |
|---|---|---|
| VM 107 (Control) | DISABLED | eBPF Direct Host-Routing (ens18) |
| VM 108 (Worker) | DISABLED | eBPF Direct Host-Routing (ens18) |
| PVE Bridge (vmbr0) | ENABLED | Hypervisor Management Plane Protection |
Following host network restoration, both compute nodes negotiated successful secure cluster entry. Querying the integrated CNI environment via cilium status verified that the infrastructure paths had attained complete operational integrity.
The eBPF controller reported 2/2 cluster hosts reachable over direct routing lanes with zero memory allocation errors or route tracking dropouts.
With a highly efficient network fabric successfully claiming the underlying kernel interfaces, the core cluster DNS resources were automatically kicked out of their pending states. The deployment engine successfully bound local container IPs and scaled up active instances smoothly:
Executing a terminal-wide kubectl get nodes confirms both hosts are sitting comfortably at a permanent Ready status. The Squirrelworks lab multi-node environment is fully live, secure, and ready to receive production-tier container deployments.