Troubleshooting¶
This guide helps diagnose and resolve common issues with the Cloudflare Tunnel Gateway Controller.
Quick Diagnostics¶
# Check controller status
kubectl get pods --namespace cloudflare-tunnel-system
# View controller logs
kubectl logs --namespace cloudflare-tunnel-system \
deployment/cloudflare-tunnel-gateway-controller
# Check Gateway status
kubectl get gateway --all-namespaces
# Check HTTPRoute status
kubectl get httproute --all-namespaces
Installation Issues¶
Schema Validation Errors¶
Problem: values don't meet the specifications of the schema
Common causes:
-
Empty or invalid
tunnelId:Solution: Provide a valid Tunnel ID from Cloudflare Zero Trust Dashboard
-
Invalid characters in
tunnelId:Solution: Use only alphanumeric characters and hyphens
-
Missing API credentials:
Solution: Set either
cloudflare.apiTokenorcloudflare.apiTokenSecretName
Verification:
# Validate values before installation
helm lint charts/cloudflare-tunnel-gateway-controller --values my-values.yaml
# Dry-run installation
helm install --dry-run --debug my-release \
oci://ghcr.io/lexfrei/cloudflare-tunnel-gateway-controller/chart \
--values my-values.yaml
Pod Startup Problems¶
CrashLoopBackOff¶
Diagnosis:
kubectl get pods --namespace cloudflare-tunnel-system
kubectl logs --namespace cloudflare-tunnel-system \
deployment/cloudflare-tunnel-gateway-controller
kubectl describe pod --namespace cloudflare-tunnel-system \
--selector app.kubernetes.io/name=cloudflare-tunnel-gateway-controller
Common causes:
| Error | Cause | Solution |
|---|---|---|
authentication failed | Invalid API token | Verify token scopes |
secret not found | Missing secret | Create required secret |
read-only file system | Security context issue | Check emptyDir volumes |
ImagePullBackOff¶
Diagnosis:
Solutions:
image:
repository: ghcr.io/lexfrei/cloudflare-tunnel-gateway-controller
tag: "" # Uses appVersion from Chart.yaml
pullPolicy: IfNotPresent
# For private registries
imagePullSecrets:
- name: ghcr-credentials
Authentication and API Issues¶
Invalid Cloudflare API Token¶
Symptoms:
- Pods crash with authentication errors
- Logs show
401 Unauthorizedor403 Forbidden
Diagnosis:
# Test API token manually
export CF_API_TOKEN="your-token"
curl --header "Authorization: Bearer $CF_API_TOKEN" \
https://api.cloudflare.com/client/v4/user/tokens/verify
Solution:
- Create new API token with required scopes:
-
Account.Cloudflare Tunnel:Edit
-
Update secret:
kubectl create secret generic cloudflare-credentials \ --from-literal=api-token="NEW_TOKEN" \ --namespace cloudflare-tunnel-system \ --dry-run=client --output yaml | kubectl apply --filename - kubectl rollout restart deployment/cloudflare-tunnel-gateway-controller \ --namespace cloudflare-tunnel-system
Network Connectivity¶
NetworkPolicy Blocking Traffic¶
Symptoms:
- Metrics not accessible from Prometheus
- Health checks failing
- Cannot communicate with Cloudflare API
Diagnosis:
# Check NetworkPolicy rules
kubectl get networkpolicy --namespace cloudflare-tunnel-system
# Test connectivity from debug pod
kubectl run debug --rm -it --image=nicolaka/netshoot \
--namespace cloudflare-tunnel-system -- bash
# Inside debug pod:
curl http://cloudflare-tunnel-gateway-controller:8080/metrics
curl http://cloudflare-tunnel-gateway-controller:8081/healthz
curl --head https://api.cloudflare.com
DNS Resolution Issues¶
Symptoms:
- Cannot resolve Cloudflare API endpoints
- Errors:
no such hostordial tcp: lookup failed
Diagnosis:
kubectl exec --namespace cloudflare-tunnel-system POD_NAME -- \
cat /etc/resolv.conf
kubectl exec --namespace cloudflare-tunnel-system POD_NAME -- \
nslookup api.cloudflare.com
Solution: Configure custom DNS
dnsPolicy: "None"
dnsConfig:
nameservers:
- 1.1.1.1
- 8.8.8.8
searches:
- cloudflare-tunnel-system.svc.cluster.local
- svc.cluster.local
- cluster.local
Gateway API Resources¶
Gateway Not Ready¶
Diagnosis:
kubectl get gateway --all-namespaces
kubectl describe gateway my-gateway --namespace my-namespace
kubectl logs --namespace cloudflare-tunnel-system \
deployment/cloudflare-tunnel-gateway-controller | grep -i gateway
Common causes:
| Issue | Solution |
|---|---|
| GatewayClass not found | Set gatewayClass.create: true in Helm values |
| Wrong controller name | Check gatewayClassName matches chart configuration |
| Service not found | Verify backend Services exist |
HTTPRoute Not Attached¶
Diagnosis:
Common causes:
- Namespace mismatch (use ReferenceGrant for cross-namespace)
- Invalid hostname patterns
- Backend service not found
Status Not Updating¶
Problem: Gateway/HTTPRoute status conditions not updating
Diagnosis:
kubectl auth can-i update gateways/status \
--as=system:serviceaccount:cloudflare-tunnel-system:cloudflare-tunnel-gateway-controller
Solution: Ensure ClusterRole has status subresource permissions
AmneziaWG Sidecar Issues¶
AWG Interface Creation Failures¶
Symptoms:
- Container crash with
Operation not permitted - Interface conflicts:
device already exists
Diagnosis:
kubectl get deployment --namespace cloudflare-tunnel-system \
cloudflare-tunnel-gateway-controller --output yaml | grep -A 10 securityContext
kubectl logs --namespace cloudflare-tunnel-system POD_NAME --container amneziawg
Solutions:
-
AWG requires
NET_ADMINcapability (chart handles this automatically) -
Use different interface prefixes for conflicts:
AWG DNS Overwrites Cluster DNS¶
Problem: cloudflared cannot resolve internal Kubernetes services
Symptoms:
/etc/resolv.confshows VPN DNS instead of CoreDNS- Logs show:
no such hostfor internal service names
Solution: This is handled automatically in chart version 0.2.x+. For older versions, remove DNS = ... line from AWG config file.
Verification:
kubectl exec --namespace cloudflare-tunnel-system POD_NAME \
--container cloudflared -- cat /etc/resolv.conf
# Should show CoreDNS IP (e.g., 10.96.0.10), not 1.1.1.1
Performance Issues¶
High Memory Usage¶
Symptoms:
- Pods OOMKilled
- Memory usage growing over time
Diagnosis:
Solution:
Slow Reconciliation¶
Problem: Changes to Gateway/HTTPRoute take long to apply
Diagnosis:
kubectl logs --namespace cloudflare-tunnel-system \
deployment/cloudflare-tunnel-gateway-controller | grep -i reconcil
Solutions:
- Check Cloudflare API rate limits in logs
- Verify network latency to Cloudflare API
- Ensure sufficient resources (CPU not throttled)
Debug Logging¶
Enable debug logging for detailed diagnostics:
helm upgrade cloudflare-tunnel-gateway-controller \
oci://ghcr.io/lexfrei/cloudflare-tunnel-gateway-controller/chart \
--values values.yaml \
--namespace cloudflare-tunnel-system
kubectl logs --follow --namespace cloudflare-tunnel-system \
deployment/cloudflare-tunnel-gateway-controller
Collecting Diagnostic Information¶
# Pod status and events
kubectl get pods --namespace cloudflare-tunnel-system --output wide
kubectl describe pod --namespace cloudflare-tunnel-system POD_NAME
# Recent logs
kubectl logs --tail=100 --namespace cloudflare-tunnel-system \
deployment/cloudflare-tunnel-gateway-controller
# Resource usage
kubectl top pod --namespace cloudflare-tunnel-system
# Gateway API resources
kubectl get gatewayclasses,gateways,httproutes --all-namespaces
Reporting Issues¶
When reporting issues, include:
- Helm chart version:
helm list --namespace cloudflare-tunnel-system - Kubernetes version:
kubectl version - Cloud provider and CNI plugin
- Relevant pod logs (sanitize secrets!)
- Gateway/HTTPRoute manifests (sanitize sensitive data)
Report issues at: https://github.com/lexfrei/cloudflare-tunnel-gateway-controller/issues