Kubernetes taint - what is it and how to work with it?
Taint and affinity control what pods should be repelled by the nodes (taint) and where the pods would be attracted to (affinity). That’s one of the great features of Kubernetes but there is a catch. If you run a single node cluster on your laptop (the way I like to do :)) you will often hit on a common taint - the NoSchedule one. It’s set to prevent scheduling on the master node and if you try to put some pods to play with (like Helm) you will probably hit on this problem:
[root@phix ~]# kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system coredns-86c58d9df4-nl4hq 1/1 Running 0 11h
kube-system coredns-86c58d9df4-wbg8x 1/1 Running 0 11h
kube-system etcd-phix 1/1 Running 0 11h
kube-system kube-apiserver-phix 1/1 Running 0 11h
kube-system kube-controller-manager-phix 1/1 Running 1 11h
kube-system kube-flannel-ds-amd64-jtkqn 1/1 Running 0 11h
kube-system kube-proxy-fqg5b 1/1 Running 0 11h
kube-system kube-scheduler-phix 1/1 Running 1 11h
kube-system kubernetes-dashboard-57df4db6b-cptdn 1/1 Running 0 11h
kube-system tiller-deploy-8485766469-pd9ss 0/1 Pending 0 89s
[root@phix ~]# kubectl -n kube-system describe pod tiller-deploy-8485766469-pd9ss
Name: tiller-deploy-8485766469-pd9ss
Namespace: kube-system
Priority: 0
PriorityClassName: <none>
Node: <none>
Labels: app=helm
name=tiller
pod-template-hash=8485766469
Annotations: <none>
Status: Pending
IP:
Controlled By: ReplicaSet/tiller-deploy-8485766469
Containers:
tiller:
Image: gcr.io/kubernetes-helm/tiller:v2.12.1
Ports: 44134/TCP, 44135/TCP
Host Ports: 0/TCP, 0/TCP
Liveness: http-get http://:44135/liveness delay=1s timeout=1s period=10s #success=1 #failure=3
Readiness: http-get http://:44135/readiness delay=1s timeout=1s period=10s #success=1 #failure=3
Environment:
TILLER_NAMESPACE: kube-system
TILLER_HISTORY_MAX: 0
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from tiller-token-b65qd (ro)
Conditions:
Type Status
PodScheduled False
Volumes:
tiller-token-b65qd:
Type: Secret (a volume populated by a Secret)
SecretName: tiller-token-b65qd
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 104s (x2 over 104s) default-scheduler 0/1 nodes are available: 1 node(s) had taints that the pod didn't tolerate.
[root@phix ~]#
The simple solution would be to remove this taint.
[root@phix ~]# kubectl get nodes -o json | jq .items[].spec.taints
[
{
"effect": "NoSchedule",
"key": "node-role.kubernetes.io/master"
}
]
[root@phix ~]# kubectl taint nodes --all node-role.kubernetes.io/master-
node/phix untainted
[root@phix ~]# kubectl get nodes -o json | jq .items[].spec.taints
null
[root@phix ~]#
Notice the minus sign at the end of the taint removal command.
Note for production: this is bad idea for production. Normally if you run a Kubernetes cluster you would not have just the master node but also worker nodes. In such cases it’s a great idea to keep the master node NoSchedule taint and repel pods trying to schedule on it. By design the worker nodes should be the ones taking pods.