Kubernetes pod stuck in Pending state

A Kubernetes pod stuck in Pending means the scheduler cannot place the pod on any node. The four most common causes are: insufficient CPU or memory on available nodes, taints and tolerations blocking scheduling, a PersistentVolumeClaim (PVC) that cannot bind, and node selector or affinity rules that no current node satisfies. Until one of these conditions is resolved, the pod will remain Pending indefinitely.

Diagnose in 60 seconds

Start with kubectl describe pod:

kubectl describe pod <pod-name> -n <namespace>

Scroll to the Events section at the bottom of the output. The scheduler records its reason for not placing the pod there. Common messages and what they mean:

Event messageRoot cause
0/3 nodes are available: 3 Insufficient memoryNo node has enough allocatable memory
0/3 nodes are available: 3 Insufficient cpuNo node has enough allocatable CPU
node(s) had taint that the pod didn't tolerateTaint/toleration mismatch
persistentvolumeclaim "pvc-name" not found or is not boundPVC cannot bind to a volume
node(s) didn't match node selectornodeSelector or nodeAffinity has no matching node
node(s) had untolerated taint {node.kubernetes.io/not-ready}Target node is not ready

One describe command tells you which fix path to take.

Fix: insufficient resources

Check what each node has allocated versus its capacity:

kubectl describe nodes | grep -A 5 "Allocated resources"

Or for a specific node:

kubectl describe node <node-name> | grep -A 10 "Allocated resources"

You have three remediation paths:

Scale the node group. If you are on a managed cluster (EKS, GKE, AKS), increase the node group size or ensure the cluster autoscaler is enabled and has permission to add nodes.

Reduce resource requests. If the pod's resources.requests are higher than necessary, lower them to fit within available headroom. Be careful: requests below actual usage cause noisy-neighbor problems at runtime.

resources:
  requests:
    cpu: "250m"
    memory: "256Mi"
  limits:
    cpu: "500m"
    memory: "512Mi"

Evict lower-priority pods. Set a PriorityClass on critical workloads so the scheduler can preempt lower-priority pods to free space.

Fix: taint/toleration mismatch

Inspect the taints on your nodes:

kubectl describe node <node-name> | grep -A 5 Taints

A taint looks like key=value:effect, for example dedicated=gpu:NoSchedule. A pod scheduled onto that node must declare a matching toleration:

tolerations:
  - key: "dedicated"
    operator: "Equal"
    value: "gpu"
    effect: "NoSchedule"

If the taint was applied by mistake, remove it:

kubectl taint node <node-name> dedicated=gpu:NoSchedule-

Note the trailing -: that removes the taint rather than adding it.

Common system-applied taints that catch teams off guard: node.kubernetes.io/not-ready, node.kubernetes.io/unreachable, and node.kubernetes.io/disk-pressure. These signal real node health problems and should be investigated at the node level, not papered over with tolerations.

Fix: PVC not bound

Check the PVC status:

kubectl describe pvc <pvc-name> -n <namespace>

Then check what storage classes are available in the cluster:

kubectl get storageclass

Common causes of a PVC stuck in Pending:

  • No matching StorageClass. The PVC references a StorageClass that does not exist or is misspelled. Verify the storageClassName field in the PVC spec against kubectl get storageclass.
  • No available PersistentVolume (static provisioning). If you are using static provisioning, a PV with matching access mode, capacity, and storage class must already exist.
  • Dynamic provisioner not running. The StorageClass provisioner pod may be unhealthy. Check kubectl get pods -n kube-system for provisioner pods.
  • Zone mismatch. A PVC bound to a volume in us-east-1a will not be usable by a pod scheduled to us-east-1b. Align topology constraints.

Fix example: correct a StorageClass name in the PVC spec:

spec:
  storageClassName: "gp3"  # must match exactly what kubectl get storageclass returns
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi

Fix: node selector and affinity

List all nodes and their labels:

kubectl get nodes --show-labels

Compare against the pod's nodeSelector or nodeAffinity spec. At least one node must carry every label the pod requires.

Example: a pod with this nodeSelector will Pending indefinitely if no node has the label disktype=ssd:

nodeSelector:
  disktype: ssd

Apply the label to the correct node:

kubectl label node <node-name> disktype=ssd

For nodeAffinity, use requiredDuringSchedulingIgnoredDuringExecution when placement is hard-required, and preferredDuringSchedulingIgnoredDuringExecution when it is a soft preference. Hard affinity rules that no node satisfies produce the same Pending behavior as a nodeSelector mismatch.

Prevention

Monitor cluster resource headroom. Alert when allocatable CPU or memory across the node pool exceeds 80% utilization. Pods arriving during a headroom crunch will Pending until a node is added, which takes minutes even with autoscaling.

Use ResourceQuotas per namespace. A ResourceQuota prevents a single team or deployment from claiming all cluster capacity, which would cause every other namespace's pods to Pending:

apiVersion: v1
kind: ResourceQuota
metadata:
  name: team-quota
  namespace: team-a
spec:
  hard:
    requests.cpu: "8"
    requests.memory: 16Gi
    limits.cpu: "16"
    limits.memory: 32Gi

Set PodDisruptionBudgets on critical workloads. PDBs protect availability during voluntary disruptions (node drains, cluster upgrades) by ensuring a minimum number of replicas stay running. This does not prevent Pending directly, but it prevents the conditions (underpopulated nodes after drain) that lead to scheduling pressure.

Define resource requests on every container. Pods without requests are scheduled as BestEffort and give the scheduler no information to work with. This leads to overcommit situations where nodes appear to have capacity until they do not.

AI-assisted SRE tooling can correlate scheduler events with cluster state to surface root causes faster than manual kubectl describe chains. NOFire AI surfaces these scheduling failures as part of its incident investigation workflow. See the AI SRE Benchmark to understand how AI root-cause accuracy is measured in practice.

Related debugging guides

These failure modes often appear together. See also:

Frequently asked questions

How long should I wait before a Pending pod is a problem?
If a pod has been Pending for more than 2-3 minutes after scheduling, investigate. Brief Pending during node scale-up is normal and expected.
Can a pod be stuck Pending due to image pull issues?
No. Image pull issues produce ContainerCreating or ImagePullBackOff states. Pending is purely a scheduling problem: the pod has not been assigned to a node yet, so no image pull has been attempted.
What is a PodDisruptionBudget and does it cause Pending?
A PodDisruptionBudget (PDB) limits voluntary disruptions to protect availability. PDBs can delay voluntary evictions but do not directly cause Pending. Pending is a scheduler placement failure, not an eviction constraint.
Book a demo