CAPMOX Troubleshooting Insights

January 22, 2026 · 8 min read

DevOps Consulting Engineer at Cisco Solutions GmbH

Summary:

We continue where we left off with the CAPMOX deployment. In this post, we will go through common issues during deployment and the tools used to troubleshoot and resolve them.

title image reading "Troubleshooting Focus"

Scenario

The post is an attempt to help users troubleshoot common CAPMOX issues in their environment. Every environment is unique, and different issues could arise. Follow along, and let’s fix them!

Prerequisites

The post is based on the previous one. Ensure any prerequisites are met.

Common Issues

Image Builder - Broken Pipe Error

The first step working with CAPMOX is to ensure a valid base image is generated. That means we use the image-builder with the required tooling and dependencies to do so. However, to push the created template to our Proxmox server and, in general, interact with the Proxmox server, we need to authenticate. For that reason, the documentation advises creating a PVE Admin role or a restricted user. In any case, if incorrect permissions are set to the PROXMOX_USERNAME variable, broken pipe errors will be visible.

Error

proxmox-iso.ubuntu-2404: Post "https://PROXMOX IP:PORT/api2/json/nodes/pve/storage/local/upload": write tcp EXECUTION MACHINE:56542->PROXMOX IP:PORT: write: broken pipe

Solution

Go through the image builder documentation and ensure the right permissions are assigned to the used token.

capmox-controller-manager Pod Error

The issue is visible while performing the initialisation of the CAPI Proxmox provider to the management cluster.

clusterctl init --infrastructure proxmox --ipam in-cluster

Error

$ kubectl logs capmox-controller-manager-7898557fd8-rnpt6 -n capmox-system
I0117 15:59:44.625459       1 main.go:95] "starting capmox" logger="setup"
I0117 15:59:44.627001       1 main.go:138] "feature gates: ClusterTopology=false\n" logger="setup"
E0117 15:59:44.631453       1 main.go:145] "unable to setup proxmox API client" err="unable to initialize proxmox api client: 501 Method 'GET /api2/json/version' not implemented" logger="setup"

Solution

This issue most commonly occurs due to the incorrect definition of the PROXMOX_URL, PROXMOX_TOKEN, and PROXMOX_SECRET environmental variables. Ensure the following variables are exported and defined correctly.

$ export PROXMOX_URL="https://<PROXMOX_IP/HOSTNAME>:<LISTENING PORT>"
$ export PROXMOX_TOKEN='image-builder@pve!image-builder'
$ export PROXMOX_SECRET="5d44bf10-1234-5678-9101-ff7bdd8963fa"

Incorrect CONTROL_PLANE_ENDPOINT_IP Definition

The CONTROL_PLANE_ENDPOINT_IP variable definition is required during the clusterctl generate cluster action and had to be unique. Outside the DHCP range (if used), and in the same subnet/network as the NODE_IP_RANGES definition.

If the CONTROL_PLANE_ENDPOINT_IP is used by another machine within the network or defined in the NODE_IP_RANGES variable, the deployment will fail. The CONTROL_PLANE_ENDPOINT_IP is important because it is used as a stable, single access point for all interactions within a Kubernetes cluster. The kube-apiserver will be listening on this IP address.

If you try to get the kubeconfig of the controlplane node to authenticate with the cluster, you will not be able to reach the cluster, as the IP address defined is used in the range group.

Solution

Update the variable to something that does not conflict with or is not used by another device in the same subnet/network.

Missing `schedulerHints.memoryAdjustment` CAPI Configuration

As mentioned by Kyriakos in his post, we need to enable memory overcommit to allow Proxmox to create the virtual machines while allowing memory overcommit. If not defined, the virtual machines will simply hang.

Error

Message:               * Machine proxmox05-control-plane-4948h:
* InfrastructureReady: cannot reserve 4294967296B of memory on node pve: 0B available memory left

Solution

apiVersion: infrastructure.cluster.x-k8s.io/v1alpha1
kind: ProxmoxCluster
metadata:
  name: proxmox01
  namespace: default
spec:
  ...
  schedulerHints:
    memoryAdjustment: 0

Cloud Init Errors

There are instances when the virtual machines are created in Proxmox; however, the configuration is not getting applied. One possibility is that cloud-init is incomplete or exited with errors. There might be different issues that could stop the deployment, but we will focus on cloud-init issues.

An SSH key is already defined during the cluster creation. We can SSH to the virtual machines and investigate further.

cloud-init Checks

$ cloud-init status # Get the status of the cloud-init configuration

$ sudo cloud-init collect-logs # The command will generate the `cloud-init.tar.gz` file

$ tar -xvf cloud-init.tar.gz # To untar the file

Once the cloud-init logs are available, take a looks at directories that might indicate errors and/or failures. Once a potential error is identified, continue with the basic networking checks.

Network Checks

Ensure basic networking configuration is applied to the machine. IP address, Gateway, default route, and DNS.

$ ip link # Layer 2 MAC address validation

$ ip add # Layer 3 IP address validation

$ ip route # Check the routing table on the virtual machine and validate that a default route exists

$ cat /etc/resolv.conf # Ensure the correct DNS server for the subnet/network is defined

Once the above commands have been validated, continue with some basic ping commands.

$ ping <another machine in the same subnet>

$ ping <gateway IP address>

$ ping www.google.com

If any of the above fail and the network configuration is correct, check the firewall on the machine. Is there anything that might be dropping traffic? If the firewall on the machine seems to be correctly configured, check the home router firewall.

In my case, in the defined VLAN, there was a firewall rule blocking all traffic from that VLAN to the outside (forgotten from a previous deployment). As a result, cloud-init was not able to reach the Internet to finish the initial configuration.

Network Time Protocol (NTP) Drift

Once basic network issues are resolved, the controlplane and worker machines were created. Cloud-init was successful; however, the worker machines were not able to join the cluster.

Error

...
x509: certificate has expired or is not yet valid

Check ControlPlane

Ensure the controlplane node/s are in a "Ready" state. If no CNI (Container Network Interface) is deployed, go ahead and do so. On the controlplane node, we can check the kubeadm status and whether the required resources (manifests and files) are created under the /etc/kubernetes/manifests/ directory. If the controlplane node looks okay, go back to the management cluster and check the status of the CAPI cluster.

Cluster Check Management Cluster

$ export KUBECONFIG=/path/to/management/kubeconfig

$ kubectl describe cluster <name>

$ kubectl describe kubeadmcontrolplane <name>

Check for errors. They will give some hints on what went wrong.

In my case, the certificates created for the worker nodes to join or maintain communication with the controlplane were no longer valid. At first, I thought, "Okay, this is weird, why?"

Controlplane Checks

$ kubeadm certs check-expiration # Check the certificate status and expiration dates

Solution

Check the NTP server and date/time on Proxmox
Check the NTP server and date/time on the Kubernetes management cluster
Check the NTP server and date/time on the machines created

Indeed, there was a time drift between the created nodes and the underlying Kubernetes management cluster.

Correct the date/time on the Kubernetes management cluster
Delete the existing cluster creation
Re-apply the cluster configuration.

Sufficient Resources Management Cluster

There are instances when the underlying management cluster lacks sufficient resources to create additional nodes or handle the additional load of creating CAPI resources. No errors will be visible in the created resources; however, the cluster creation will stop in the control plane. This situation could be confusing, but ensure the underlying cluster has sufficient memory.

ControlPlane Nodes Even Number

During the deployment, if an even number of controlplane nodes is defined, the deployment will stop with an error. Based on best practices, an odd number has to be defined (e.g., 1, 3, 5, etc.).

Conclusion

CAPI to the moon! 🚀 I hope the guide gave you some ideas and hints on what might be wrong with your own deployments!

Resources

Cluster API Troubleshooting Guide

✉️ Contact

If you have any questions, feel free to get in touch! You can use the Discussions option found here or reach out to me on any of the social media platforms provided. 😊 We look forward to hearing from you!

Series Narigation

Part	Title
Part 1	Introduction to CAPMOX
Part 2	Troubleshooting CAPMOX deployments
Part 3	CAPMOX with Sveltos

Scenario​

Prerequisites​

Common Issues​

Image Builder - Broken Pipe Error​

Error​

Solution​

capmox-controller-manager Pod Error​

Error​

Solution​

Incorrect CONTROL_PLANE_ENDPOINT_IP Definition​

Solution​

Missing schedulerHints.memoryAdjustment CAPI Configuration​

Error​

Solution​

Cloud Init Errors​

cloud-init Checks​

Network Checks​

Network Time Protocol (NTP) Drift​

Error​

Check ControlPlane​

Cluster Check Management Cluster​

Controlplane Checks​

Solution​

Sufficient Resources Management Cluster​

ControlPlane Nodes Even Number​

Conclusion​

Resources​

✉️ Contact​

Series Narigation​

Scenario

Prerequisites

Common Issues

Image Builder - Broken Pipe Error

Error

Solution

capmox-controller-manager Pod Error

Error

Solution

Incorrect CONTROL_PLANE_ENDPOINT_IP Definition

Solution

Missing `schedulerHints.memoryAdjustment` CAPI Configuration

Error

Solution

Cloud Init Errors

cloud-init Checks

Network Checks

Network Time Protocol (NTP) Drift

Error

Check ControlPlane

Cluster Check Management Cluster

Controlplane Checks

Solution

Sufficient Resources Management Cluster

ControlPlane Nodes Even Number

Conclusion

Resources

✉️ Contact

Series Narigation