100 common problems in Kubernetes LoadBalancer implementations

Here’s a comprehensive list of 100 common problems in Kubernetes LoadBalancer implementations, organized by categories (architecture, networking, configuration, DNS, health checks, cloud provider issues, etc.) with brief technical explanations:

🧩 1. Architectural and Design-Level Issues

Misunderstanding Layer 4 vs. Layer 7 load balancing.
Using LoadBalancer type on bare-metal clusters without MetalLB or similar.
Multiple LoadBalancers per service causing excessive cloud resource usage.
No external IP assigned due to pending load balancer provisioning.
Failure to expose internal services (wrong external/internal annotation).
Inconsistent behavior across cloud providers (AWS vs GCP vs Azure).
Exceeding the limit of allowed load balancers per cloud project.
Misaligned CIDR ranges between cluster and external network.
Overlapping service CIDRs causing routing conflicts.
Using external load balancers without proper NAT handling.
Ignoring idle connection timeouts in cloud LB (common in AWS ELB).
Lack of HA strategy for single load balancer dependency.
Not accounting for failover between multiple zones.
LoadBalancer fronting another LoadBalancer (double LB hop).
Insufficient throughput capacity for expected workloads.
Using NodePort underneath without firewall rules for nodes.
Load balancer not resilient to node restarts or scaling.
Using wrong protocol type (TCP vs UDP vs HTTP).
Exposing control plane components accidentally.
Traffic not routed through kube-proxy (bypassing service rules).

🌐 2. Networking and Connectivity Problems

Misconfigured CNI plugin blocking external traffic.
LoadBalancer not accessible due to missing external routes.
NetworkPolicy blocking health check probes.
Cloud firewall rules missing for NodePort ranges (30000–32767).
Incorrect MTU leading to packet fragmentation/loss.
Node IP not reachable from LB due to NAT misconfig.
LoadBalancer health checks hitting wrong port or path.
Source IP preserved incorrectly, breaking backend logic.
Reverse path filtering causing dropped packets.
Connection tracking issues (conntrack table overflow).
Node local routing bypassing kube-proxy IPVS tables.
Multiple NICs confusing the load balancer routing.
BGP peering instability (in MetalLB setups).
ARP/NDP conflicts between MetalLB speakers.
VXLAN overlay interfering with external routes.
Routing table overflow (too many routes).
SNAT masking client IPs (breaking access logs).
Kubernetes IPVS not syncing with kernel conntrack.
Proxy ARP disabled on nodes (MetalLB issue).
Incorrect egress IP or masquerade setup.

⚙️ 3. Configuration and Annotation Errors

Missing cloud-specific annotations (e.g., AWS ALB ingress annotations).
Wrong load balancer class (loadBalancerClass field not set).
Misconfigured health check path annotation.
Backend protocol mismatch (HTTP vs HTTPS).
Missing SSL certificate reference.
Incorrect security group annotations.
Service selector not matching any pods.
Missing externalTrafficPolicy configuration.
Misusing sessionAffinity settings.
Wrong loadBalancerIP specified (not in pool).
Missing loadBalancerSourceRanges.
Disabled cross-zone load balancing by mistake.
Using unsupported annotations in managed clusters.
Forgetting to delete dangling LB when service is removed.
Overly aggressive externalTrafficPolicy=Local causing node starvation.
Conflicting annotations between multiple ingress controllers.
Cloud provider ignoring unrecognized annotation.
Unintentionally setting loadBalancerSourceRanges: 0.0.0.0/0.
Auto-assigned IP not in allowed subnet range.
Health probe ports mismatched with container ports.

🧱 4. Ingress Controller Integration Problems

Ingress controller using same ports as LoadBalancer.
Duplicate ingress rules sending traffic to wrong backend.
Path rewrite rules conflicting with app routes.
TLS secret not found by ingress controller.
Default backend misconfigured or missing.
Ingress not picking up annotations from Service.
Conflicts between Traefik and NGINX ingress controllers.
Cert-manager not updating ingress TLS cert.
Hostname mismatch causing SSL handshake failure.
Ingress controller pod crashlooping due to invalid config.
Load balancer health checks failing due to HTTP 301/302 redirects.
Misconfigured ingress class (IngressClassName not set).
Missing X-Forwarded-For header propagation.
HTTP → HTTPS redirection loop.
Wildcard hostnames not resolving properly.
Static IP not associated with ingress LB.
Overlapping host rules across namespaces.
Backend timeout lower than LB idle timeout.
Unsupported path type (Exact vs Prefix mismatch).
Controller RBAC not allowing status updates.

☁️ 5. Cloud Provider and Infrastructure Problems

Cloud provider API quota exhausted (cannot create LB).
Service stuck in “pending” due to missing IAM permissions.
Firewall rules not auto-created by cloud controller.
Cloud controller not running in cluster.
Using private subnet for LoadBalancer IPs unintentionally.
Cloud LB not supporting IPv6 while cluster does.
Static IP reservation expired or released.
Using custom network tags that block LB provisioning.
Cloud load balancer name too long for provider limit.
Cloud provider API latency causing update delays.
Regional vs. zonal LB mismatch.
Load balancer nodes not detected due to tag mismatch.
Cloud controller manager version incompatible with cluster.
IAM policy missing elasticloadbalancing:* permissions.
Cloud load balancer doesn’t support UDP (e.g., AWS Classic ELB).
Load balancer node pool scaled down automatically.
Backend instance registration failing silently.
Security group dependency cycles (common in AWS).
Subnet exhaustion—no available IPs for new LBs.
Provider rate limits hit due to frequent service updates.

🔍 Reference Source Chains

Many of these issues can be traced through:

Kubernetes source: pkg/cloudprovider/providers/* pkg/proxy/ipvs/, pkg/proxy/iptables/, pkg/proxy/topology.go
Cloud Controller Manager logic: kubernetes/cloud-provider controller/service/service_controller.go
MetalLB internals: metallb/metallb → speaker/arp.go, bgp_controller.go
Ingress controllers: kubernetes/ingress-nginx, traefik/traefik

Tags