When working with GPU-enabled nodes in OpenShift, it’s common to dedicate certain nodes exclusively for GPU workloads. To ensure that only GPU-specific pods are scheduled on those nodes, you can use taints and tolerations.
What Are Taints and Tolerations?
- Taint: Applied to a node to repel pods that do not explicitly tolerate it.
- Toleration: Added to a pod to allow it to be scheduled on a node with the matching taint.
By using taints and tolerations together, you can reserve GPU nodes for GPU workloads.
Step 1: Label GPU Nodes (Optional but Recommended)
It’s good practice to label GPU nodes first so you can easily target them later.
oc label node <gpu-node-name> accelerator=nvidia
Verify:
oc get nodes --show-labels | grep accelerator
Step 2: Apply a Taint to GPU Nodes
Run the following command to taint the node:
oc adm taint nodes <gpu-node-name> nvidia.com/gpu=true:NoSchedule
This means pods without the matching toleration will not be scheduled on this node.
Check taints:
oc describe node <gpu-node-name> | grep Taints
Step 3: Add a Toleration to Your GPU Workload
Here’s an example of a pod spec that tolerates the GPU taint:
apiVersion: v1
kind: Pod
metadata:
name: gpu-pod
spec:
containers:
- name: gpu-container
image: nvidia/cuda:12.3.2-base
resources:
limits:
nvidia.com/gpu: 1
tolerations:
- key: "nvidia.com/gpu"
operator: "Equal"
value: "true"
effect: "NoSchedule"
This pod will only run on a node that has GPUs and the taint we created earlier.
Step 4: Combine with Node Selectors (Optional)
To further ensure pods land on GPU nodes, add a nodeSelector that matches the label you applied earlier:
nodeSelector:
accelerator: nvidia
Full example:
apiVersion: v1
kind: Pod
metadata:
name: gpu-pod
spec:
containers:
- name: gpu-container
image: nvidia/cuda:12.3.2-base
resources:
limits:
nvidia.com/gpu: 1
tolerations:
- key: "nvidia.com/gpu"
operator: "Equal"
value: "true"
effect: "NoSchedule"
nodeSelector:
accelerator: nvidia
Step 5: Verify Scheduling
Deploy the pod and check where it’s running:
oc get pods -o wide
It should schedule only on the GPU node.