Kubernetes pod stuck in Pending state
A Kubernetes pod stuck in Pending means the scheduler cannot place the pod on any node. The four most common causes are: insufficient CPU or memory on available nodes, taints and tolerations blocking scheduling, a PersistentVolumeClaim (PVC) that cannot bind, and node selector or affinity rules that no current node satisfies. Until one of these conditions is resolved, the pod will remain Pending indefinitely.
Diagnose in 60 seconds
Start with kubectl describe pod:
kubectl describe pod <pod-name> -n <namespace>Scroll to the Events section at the bottom of the output. The scheduler records its reason for not placing the pod there. Common messages and what they mean:
| Event message | Root cause |
|---|---|
0/3 nodes are available: 3 Insufficient memory | No node has enough allocatable memory |
0/3 nodes are available: 3 Insufficient cpu | No node has enough allocatable CPU |
node(s) had taint that the pod didn't tolerate | Taint/toleration mismatch |
persistentvolumeclaim "pvc-name" not found or is not bound | PVC cannot bind to a volume |
node(s) didn't match node selector | nodeSelector or nodeAffinity has no matching node |
node(s) had untolerated taint {node.kubernetes.io/not-ready} | Target node is not ready |
One describe command tells you which fix path to take.
Fix: insufficient resources
Check what each node has allocated versus its capacity:
kubectl describe nodes | grep -A 5 "Allocated resources"Or for a specific node:
kubectl describe node <node-name> | grep -A 10 "Allocated resources"You have three remediation paths:
Scale the node group. If you are on a managed cluster (EKS, GKE, AKS), increase the node group size or ensure the cluster autoscaler is enabled and has permission to add nodes.
Reduce resource requests. If the pod's resources.requests are higher than necessary, lower them to fit within available headroom. Be careful: requests below actual usage cause noisy-neighbor problems at runtime.
resources:
requests:
cpu: "250m"
memory: "256Mi"
limits:
cpu: "500m"
memory: "512Mi"Evict lower-priority pods. Set a PriorityClass on critical workloads so the scheduler can preempt lower-priority pods to free space.
Fix: taint/toleration mismatch
Inspect the taints on your nodes:
kubectl describe node <node-name> | grep -A 5 TaintsA taint looks like key=value:effect, for example dedicated=gpu:NoSchedule. A pod scheduled onto that node must declare a matching toleration:
tolerations:
- key: "dedicated"
operator: "Equal"
value: "gpu"
effect: "NoSchedule"If the taint was applied by mistake, remove it:
kubectl taint node <node-name> dedicated=gpu:NoSchedule-Note the trailing -: that removes the taint rather than adding it.
Common system-applied taints that catch teams off guard: node.kubernetes.io/not-ready, node.kubernetes.io/unreachable, and node.kubernetes.io/disk-pressure. These signal real node health problems and should be investigated at the node level, not papered over with tolerations.
Fix: PVC not bound
Check the PVC status:
kubectl describe pvc <pvc-name> -n <namespace>Then check what storage classes are available in the cluster:
kubectl get storageclassCommon causes of a PVC stuck in Pending:
- No matching StorageClass. The PVC references a StorageClass that does not exist or is misspelled. Verify the
storageClassNamefield in the PVC spec againstkubectl get storageclass. - No available PersistentVolume (static provisioning). If you are using static provisioning, a PV with matching access mode, capacity, and storage class must already exist.
- Dynamic provisioner not running. The StorageClass provisioner pod may be unhealthy. Check
kubectl get pods -n kube-systemfor provisioner pods. - Zone mismatch. A PVC bound to a volume in
us-east-1awill not be usable by a pod scheduled tous-east-1b. Align topology constraints.
Fix example: correct a StorageClass name in the PVC spec:
spec:
storageClassName: "gp3" # must match exactly what kubectl get storageclass returns
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10GiFix: node selector and affinity
List all nodes and their labels:
kubectl get nodes --show-labelsCompare against the pod's nodeSelector or nodeAffinity spec. At least one node must carry every label the pod requires.
Example: a pod with this nodeSelector will Pending indefinitely if no node has the label disktype=ssd:
nodeSelector:
disktype: ssdApply the label to the correct node:
kubectl label node <node-name> disktype=ssdFor nodeAffinity, use requiredDuringSchedulingIgnoredDuringExecution when placement is hard-required, and preferredDuringSchedulingIgnoredDuringExecution when it is a soft preference. Hard affinity rules that no node satisfies produce the same Pending behavior as a nodeSelector mismatch.
Prevention
Monitor cluster resource headroom. Alert when allocatable CPU or memory across the node pool exceeds 80% utilization. Pods arriving during a headroom crunch will Pending until a node is added, which takes minutes even with autoscaling.
Use ResourceQuotas per namespace. A ResourceQuota prevents a single team or deployment from claiming all cluster capacity, which would cause every other namespace's pods to Pending:
apiVersion: v1
kind: ResourceQuota
metadata:
name: team-quota
namespace: team-a
spec:
hard:
requests.cpu: "8"
requests.memory: 16Gi
limits.cpu: "16"
limits.memory: 32GiSet PodDisruptionBudgets on critical workloads. PDBs protect availability during voluntary disruptions (node drains, cluster upgrades) by ensuring a minimum number of replicas stay running. This does not prevent Pending directly, but it prevents the conditions (underpopulated nodes after drain) that lead to scheduling pressure.
Define resource requests on every container. Pods without requests are scheduled as BestEffort and give the scheduler no information to work with. This leads to overcommit situations where nodes appear to have capacity until they do not.
AI-assisted SRE tooling can correlate scheduler events with cluster state to surface root causes faster than manual kubectl describe chains. NOFire AI surfaces these scheduling failures as part of its incident investigation workflow. See the AI SRE Benchmark to understand how AI root-cause accuracy is measured in practice.
Related debugging guides
These failure modes often appear together. See also:
Frequently asked questions
- How long should I wait before a Pending pod is a problem?
- If a pod has been Pending for more than 2-3 minutes after scheduling, investigate. Brief Pending during node scale-up is normal and expected.
- Can a pod be stuck Pending due to image pull issues?
- No. Image pull issues produce ContainerCreating or ImagePullBackOff states. Pending is purely a scheduling problem: the pod has not been assigned to a node yet, so no image pull has been attempted.
- What is a PodDisruptionBudget and does it cause Pending?
- A PodDisruptionBudget (PDB) limits voluntary disruptions to protect availability. PDBs can delay voluntary evictions but do not directly cause Pending. Pending is a scheduler placement failure, not an eviction constraint.
Go deeper: the AI SRE Benchmark
Book a demo