How to deploy and distribute the workload on a multi-cloud Kubernetes environment

This article will guide you through the best practices to deploy and distribute the workload on a Kubernetes environment on Scaleway's Kosmos.

⚠️ Warning reminder

This article will balance between concept explanations and operations or commands that need to be performed by the reader.

If this icon (🔥) is present before an image, a command, or a file, you are required to perform an action.

So remember, when 🔥 is on, so are you!

Redundancy

🔥 Labels

First, we are going to start by listing your nodes, and more specifically their associated labels. The kubectl get nodes --show-labels command will perform this action for us.

🔥 kubectl get nodes --show-labels --no-headers | awk '{print "NODE NAME: "$1","$6"\n"}' | tr "," "\n"

Output

NODE NAME: scw-kosmos-kosmos-scw-09371579edf54552b0187a95beta.kubernetes.io/arch=amd64beta.kubernetes.io/instance-type=DEV1-Mbeta.kubernetes.io/os=linuxfailure-domain.beta.kubernetes.io/region=nl-amsfailure-domain.beta.kubernetes.io/zone=nl-ams-1k8s.scaleway.com/kapsule=b58ad1f6-2a4d-4c0b-8573-459fad62682fk8s.scaleway.com/managed=truek8s.scaleway.com/node=09371579-edf5-4552-b018-7a95e779b70ek8s.scaleway.com/pool-name=kosmos-scwk8s.scaleway.com/pool=313ccb19-0233-4dc9-b582-b1e687903b7ak8s.scaleway.com/runtime=containerdkubernetes.io/arch=amd64kubernetes.io/hostname=scw-kosmos-kosmos-scw-09371579edf54552b0187a95kubernetes.io/os=linuxnode.kubernetes.io/instance-type=DEV1-Mtopology.csi.scaleway.com/zone=nl-ams-1topology.kubernetes.io/region=nl-amstopology.kubernetes.io/zone=nl-ams-1 NODE NAME: scw-kosmos-worldwide-5ecdb6d02cf84d63937af45a6beta.kubernetes.io/arch=amd64beta.kubernetes.io/os=linuxk8s.scw.cloud/disable-lifecycle=truek8s.scw.cloud/node-public-ip=151.115.36.196kubernetes.io/arch=amd64kubernetes.io/hostname=scw-kosmos-worldwide-5ecdb6d02cf84d63937af45a6kubernetes.io/os=linuxtopology.kubernetes.io/region=scw-kosmos-worldwide-5ecdb6d02cf84d63937af45a6 NODE NAME: scw-kosmos-worldwide-b2db708b0c474decb7447e0d6beta.kubernetes.io/arch=amd64beta.kubernetes.io/os=linuxk8s.scw.cloud/disable-lifecycle=truek8s.scw.cloud/node-public-ip=65.21.146.191kubernetes.io/arch=amd64kubernetes.io/hostname=scw-kosmos-worldwide-b2db708b0c474decb7447e0d6kubernetes.io/os=linuxtopology.kubernetes.io/region=scw-kosmos-worldwide-b2db708b0c474decb7447e0d6

For each of our three nodes, we see many labels. The first node on the list has considerably more labels as it is managed by a Kubernetes Kosmos engine. In this case, more information about features and node management is added.

🔥 Adding labels to distinguish Cloud providers

As it might not be easy to remember which node comes from which provider, and as it can help us distribute our workload across providers, we are going to label our nodes with a label called provider with values such as scaleway or hetzner.

kubectl label nodes scw-kosmos-kosmos-scw-09371579edf54552b0187a95 provider=scaleway

kubectl label nodes scw-kosmos-worldwide-5ecdb6d02cf84d63937af45a6 provider=scaleway

kubectl label nodes scw-kosmos-worldwide-b2db708b0c474decb7447e0d6 provider=hetzner

In addition, we are also going to add label to our unmanaged Scaleway node to specify that it is, in fact, not managed by the engine. For that we use the same label used on the managed Scaleway node, but set to false: k8s.scaleway.com/managed=false.

kubectl label nodes scw-kosmos-worldwide-5ecdb6d02cf84d63937af45a6 k8s.scaleway.com/managed=false

🔥 Listing our labels

Let's list our labels to ensure that the provider label is well set on our three nodes.

🔥 kubectl get nodes --show-labels --no-headers | awk '{print "NODE NAME: "$1","$6"\n"}' | tr "," "\n"

Output

NODE NAME: scw-kosmos-kosmos-scw-09371579edf54552b0187a95beta.kubernetes.io/arch=amd64beta.kubernetes.io/instance-type=DEV1-Mbeta.kubernetes.io/os=linuxfailure-domain.beta.kubernetes.io/region=nl-amsfailure-domain.beta.kubernetes.io/zone=nl-ams-1k8s.scaleway.com/kapsule=b58ad1f6-2a4d-4c0b-8573-459fad62682fk8s.scaleway.com/managed=truek8s.scaleway.com/node=09371579-edf5-4552-b018-7a95e779b70ek8s.scaleway.com/pool-name=kosmos-scwk8s.scaleway.com/pool=313ccb19-0233-4dc9-b582-b1e687903b7ak8s.scaleway.com/runtime=containerdkubernetes.io/arch=amd64kubernetes.io/hostname=scw-kosmos-kosmos-scw-09371579edf54552b0187a95kubernetes.io/os=linuxnode.kubernetes.io/instance-type=DEV1-Mprovider=scalewaytopology.csi.scaleway.com/zone=nl-ams-1topology.kubernetes.io/region=nl-amstopology.kubernetes.io/zone=nl-ams-1NODE NAME: scw-kosmos-worldwide-5ecdb6d02cf84d63937af45a6beta.kubernetes.io/arch=amd64beta.kubernetes.io/os=linuxk8s.scaleway.com/managed=falsek8s.scw.cloud/disable-lifecycle=truek8s.scw.cloud/node-public-ip=151.115.36.196kubernetes.io/arch=amd64kubernetes.io/hostname=scw-kosmos-worldwide-5ecdb6d02cf84d63937af45a6kubernetes.io/os=linuxprovider=scalewaytopology.kubernetes.io/region=scw-kosmos-worldwide-5ecdb6d02cf84d63937af45a6NODE NAME: scw-kosmos-worldwide-b2db708b0c474decb7447e0d6beta.kubernetes.io/arch=amd64beta.kubernetes.io/os=linuxk8s.scw.cloud/disable-lifecycle=truek8s.scw.cloud/node-public-ip=65.21.146.191kubernetes.io/arch=amd64kubernetes.io/hostname=scw-kosmos-worldwide-b2db708b0c474decb7447e0d6kubernetes.io/os=linuxprovider=hetznertopology.kubernetes.io/region=scw-kosmos-worldwide-b2db708b0c474decb7447e0d6

Deployment and observation: What happens in a Multi-Cloud cluster?

🔥 A first very simple deployment

To better understand the behavior of a Multi-Cloud Kubernetes cluster, we are going to create a very simple deployment using kubectl. This deployment will run three replicas of the busybox image, each of which will print the date every ten seconds.

kubectl create deploy first-deployment --replicas=3 --image=busybox -- /bin/sh -c "while true; do date; sleep 10; done"

Once the deployment has been created, we can observe what is actually happening on our cluster.

kubectl get all

Output

NAME                                  READY  STATUS   RESTARTS  AGEpod/first-deployment-695f579bd4-cfg6l  1/1    Running  0         8spod/first-deployment-695f579bd4-jzft8  1/1    Running  0         8spod/first-deployment-695f579bd4-rt5jt  1/1    Running  0         8sNAME                TYPE       CLUSTER-IP  EXTERNAL-IP  PORT(S)  AGEservice/kubernetes  ClusterIP  10.32.0.1   <none      443/TCP  53mNAME                             READY  UP-TO-DATE  AVAILABLE  AGEdeployment.apps/first-deployment  3/3    3           3          8sNAME                                       DESIRED CURRENT READY AGEreplicaset.apps/first-deployment-695f579bd4 3       3       3     8s

Our first observation is that our deployment object is here, along with the three pods (replicas) we asked for. We can also observe that another "unexpected" object was also created, a replicaset. The replicaset is an intermediary object created by the deployment in charge of maintaining and monitoring the replicas.

Now, let's have a quick look inside one of our pods to see if it performs normally.

🔥 kubectl logs pod/first-deployment-695f579bd4-cfg6l

Output

Mon Sep  6 08:41:01 UTC 2021Mon Sep  6 08:41:11 UTC 2021Mon Sep  6 08:41:21 UTC 2021Mon Sep  6 08:41:31 UTC 2021

We can see that our pod is writing the date every ten seconds, which is exactly what we asked it to do.

Now, the real question is, where are these pods running? We can use the kubectl get pods to give us the name of the node where they actually run.

🔥 kubectl get pods -o custom-columns=NAME:.metadata.name,NODE:.spec.nodeName

Output

NAME                                NODEfirst-deployment-695f579bd4-cfg6l   scw-kosmos-kosmos-scw-0937first-deployment-695f579bd4-jzft8   scw-kosmos-kosmos-scw-0937first-deployment-695f579bd4-rt5jt   scw-kosmos-kosmos-scw-0937

When listing our three pods and their location, it seems that they all run on the same managed Scaleway node (the one located in Amsterdam). That's unfortunate... Let's see if we can act on this behavior.

🔥 Scaling up

The first thing we can try is to scale up our deployment and see where all our new replicas will be scheduled.

🔥 kubectl scale deployment first-deployment --replicas=15

The scaling has been applied, we can list our pods again.

🔥 kubectl get pods -o custom-columns=NAME:.metadata.name,NODE:.spec.nodeName

Output

NAME                                NODEfirst-deployment-695f579bd4-5jq9q   scw-kosmos-kosmos-scw-0937first-deployment-695f579bd4-5t6tw   scw-kosmos-kosmos-scw-0937first-deployment-695f579bd4-5twcj   scw-kosmos-kosmos-scw-0937first-deployment-695f579bd4-5xljr   scw-kosmos-kosmos-scw-0937first-deployment-695f579bd4-8phq5   scw-kosmos-kosmos-scw-0937first-deployment-695f579bd4-cfg6l   scw-kosmos-kosmos-scw-0937first-deployment-695f579bd4-jzft8   scw-kosmos-kosmos-scw-0937first-deployment-695f579bd4-nf9fg   scw-kosmos-kosmos-scw-0937first-deployment-695f579bd4-nsxb6   scw-kosmos-kosmos-scw-0937first-deployment-695f579bd4-ptlkp   scw-kosmos-kosmos-scw-0937first-deployment-695f579bd4-rgdqj   scw-kosmos-kosmos-scw-0937first-deployment-695f579bd4-rt5jt   scw-kosmos-kosmos-scw-0937first-deployment-695f579bd4-vrl95   scw-kosmos-kosmos-scw-0937first-deployment-695f579bd4-vwv7l   scw-kosmos-kosmos-scw-0937first-deployment-695f579bd4-w9qqq   scw-kosmos-kosmos-scw-0937

And they are still all running on the same node.

To go further, we are going to play with more complex configuration. In order to do so without getting mixed up with our other configurations, deployments and pods, it is best to clean our environment and delete our deployment.

🔥 kubectl delete deployment first-deployment

Output
deployment.apps "first-deployment" deleted

Yaml files

Kubectl commands are nice, but when it comes to managing multiple Kubernetes objects, configuration files are a better and more reliable fit‌. In Kubernetes, configurations are made in yaml format, always following a pattern similar to the one below:

#example.yaml--apiVersion: apps/v1 # version of the k8s apikind: Pod           # type of the Kubernetes object we aim to describemetadata:           # additional options such as the object name, labels, annotationsspec:               # parameters and options of the k8s object to create

Selecting where to run our pods

In Kubernetes, there are different options available to distribute our workload across nodes, namespaces, or depending on affinity, between pods. Working in a Multi-Cloud Kubernetes environmnent makes their usage mandatory and knowing them and their behavior can rapidly become crucial.

🔥 NodeSelector

A node selector is applied on a pod and will match labels that exist on the cluster nodes. The command below gives us all information about a given node, including labels, annotations, running pods, etc...

🔥 kubectl describe node scw-kosmos-kosmos-scw-09371579edf54552b0187a95

Output

Name:               scw-kosmos-kosmos-scw-09371579edf54552b0187a95Roles:              <none>Labels:             beta.kubernetes.io/arch=amd64                   beta.kubernetes.io/instance-type=DEV1-M                   beta.kubernetes.io/os=linux                   failure-domain.beta.kubernetes.io/region=nl-ams                   failure-domain.beta.kubernetes.io/zone=nl-ams-1                   k8s.scaleway.com/kapsule=b58a[...]                   k8s.scaleway.com/managed=true                   k8s.scaleway.com/node=0937[...]                   k8s.scaleway.com/pool=313c[...]                   k8s.scaleway.com/pool-name=kosmos-scw                   k8s.scaleway.com/runtime=containerd                   kubernetes.io/arch=amd64                   kubernetes.io/hostname=scw-kosmos-kosmos-scw-0937[...]                   kubernetes.io/os=linux                   node.kubernetes.io/instance-type=DEV1-M                   provider=scaleway                   topology.csi.scaleway.com/zone=nl-ams-1                   topology.kubernetes.io/region=nl-ams                   topology.kubernetes.io/zone=nl-ams-1Annotations:        csi.volume.kubernetes.io/nodeid: {"csi.scaleway.com":"[...]"}                   kilo.squat.ai/discovered-endpoints: {}                   kilo.squat.ai/endpoint: 51.15.123.156:51820                   kilo.squat.ai/force-endpoint: 51.15.123.156:51820                   kilo.squat.ai/granularity: location                   kilo.squat.ai/internal-ip: 10.67.36.37/31                   kilo.squat.ai/key: cSP2[...]                   kilo.squat.ai/last-seen: 1630917821                   kilo.squat.ai/wireguard-ip: 10.4.0.1/16                   node.alpha.kubernetes.io/ttl: 0                   volumes.kubernetes.io/controller-managed-attach-detach: trueCreationTimestamp:  Mon, 06 Sep 2021 09:50:02 +0200Taints:             <none>[...]

Parts in brackets [...] are truncation of the output for more lisibility

A node selector can be applied on a pod using an existing label, or a new label created by the Kubernetes user, such as the labels we previously added on our nodes.

This is a sample yaml file using a node selector in a pod:

#example.yaml--apiVersion: apps/v1kind: Podmetadata:  name: nginxspec:  containers:  - name: nginx    image: nginx  nodeSelector:    provider: scaleway

The node selector ensures that the defined pod will only be scheduled on a node matching the condition. In this example, the nginx pod will only be scheduled on nodes with the label provider=scaleway.

NodeAffinity

Node affinity also matches labels existing on nodes, but provides more flexibility and option in terms of the rules that are applied.

First of all, the node affinity accepts two different policies:

  • requiredDuringSchedulingIgnoredDuringExecution‌‌
  • preferredDuringSchedulingIgnoredDuringExecution

As their names are self explanatory, we can easily understand that if a condition is not matched, Kubernetes might still be able to schedule pods on nodes that do not match the conditions. It allows the definitions of preferences for pod scheduling instead of mandatory criterions.

The file here is an example of requiredDuringSchedulingIgnoredDuringExecution‌‌ and preferredDuringSchedulingIgnoredDuringExecution configuration.

#example.yaml--apiVersion: apps/v1kind: Podmetadata:  name: nginxspec:  affinity:    nodeAffinity:      requiredDuringSchedulingIgnoredDuringExecution:        nodeSelectorTerms:        - matchExpressions:          - key: provider            operator: In            values:            - scaleway            - hetzner      preferedDuringSchedulingIgnoredDuringExecution:      - weight: 1        preference:        - matchExpressions:          - key: topology.kubernetes.io/region            operator: In            values:            - nl-ams  Containers:  - name: nginx    image: nginx

In this example, the pod is required to run on a node with provider=scaleway or provider=hetzner label, and should preferably be scheduled on a node with a label topology.kubernetes.io/region=nl-ams.

PodAffinity

The pod affinity constraint is applied on pods based on other pods labels. It benefits from the two same policies as the node affinity:

  • requiredDuringSchedulingIgnoredDuringExecution‌‌
  • preferredDuringSchedulingIgnoredDuringExecution

The difference with node affinity is that instead of defining rules for the cohabitation of pods on nodes, the pod affinity defines rules between pods, such as "pod 1 should run on the same node as pod 2".

In the following sample file, we specify that an nginx pod must be scheduled on any nodes containing a pod with the label app=one-per-provider.

#example.yaml--apiVersion: apps/v1kind: Podmetadata:  name: nginxspec:  affinity:    podAffinity:      requiredDuringSchedulingIgnoredDuringExecution:        - labelSelector:          matchExpressions:          - key: app            operator: In            values:            - one-per-provider          topologyKey: provider  containers:  - name: nginx    image: nginx

PodAntiAffinity

The same way Kubernetes allows us to define pod affinities, we can also define pod anti affinities, thus defining preferences for pods to not cohabitate together under some conditions.

The same two policies are available:

  • requiredDuringSchedulingIgnoredDuringExecution‌‌
  • preferredDuringSchedulingIgnoredDuringExecution

In the following sample file, we define that nginx pods should ideally not be scheduled on pods with the security=S1 label and on a node with a different value for topology.kubernetes.io/zone labels.

#example.yaml--apiVersion: apps/v1kind: Podmetadata:  name: nginxspec:  affinity:    podAntiAffinity:      preferedDuringSchedulingIgnoredDuringExecution:      - weight: 100        podAffinityTerm:          labelSelector:            matchExpressions:            - key: security              operator: In              values:              - S1            topologyKey: topology.kubernetes.io/zone  containers:  - name: nginx    image: nginx

🔥 Deploy & see: Spread our deployment across different providers

We are going to try out deploying an application across our two providers: Scaleway and Hetzner.

🔥 Let's create this antiaffinity.yaml configuration file to create a deployment where each pod will be deployed on nodes with a different provider label, and will not cohabitate with pods that have the app=one-per-provider label.

🔥

#antiaffinity.yaml---apiVersion: apps/v1kind: Deploymentmetadata:  name: one-per-provider-deployspec:  replicas: 2  selector:    matchLabels:      app: one-per-provider  template:    metadata:      labels:        app: one-per-provider    spec:      affinity:        podAntiAffinity:          requiredDuringSchedulingIgnoredDuringExecution:            - labelSelector:                matchExpressions:                - key: app                  operator: In                  values:                  - one-per-provider              topologyKey: provider      containers:      - name: busytime        image: busybox        command: ["/bin/sh","-c","while true; do date; sleep 10; done"]

🔥 Apply the deployment configuration on our cluster using the following command.

🔥 kubectl apply -f antiaffinity.yaml

Output
deployment.apps/one-per-provider-deploy created

And observe pods that were generated and the nodes they run on.

🔥 kubectl get pods -o custom-columns=NAME:.metadata.name,NODE:.spec.nodeName

Output

NAME                                       NODEone-per-provider-deploy-75945bb589-6jb87   scw-kosmos-worldwide-b2dbone-per-provider-deploy-75945bb589-db25x   scw-kosmos-kosmos-scw-0937

By looking at the name of the different nodes, we can see the pool name in the middle, informing that our two pods are deployed on different pools, and thus different nodes.

Since we have a Scaleway node in our "worldwide" pool, let's make sure that both our instances are on different providers. This information can be found in the deployment configuration, by fetching the provider label we set on it at the beginning of the workshop.

Furthermore, one of the nodes in the "worldwide" pool is a Scaleway node, so we want to make sure that both our pods don't actually run on Scaleway.

🔥 kubectl get nodes scw-kosmos-worldwide-b2db708b0c474decb7447e0d6 -o custom-columns=NAME:.metadata.name,PROVIDER:.metadata.labels.provider

Output

NAME                                             PROVIDERscw-kosmos-worldwide-b2db708b0c474decb7447e0d6   hetzner

🔥 kubectl get nodes scw-kosmos-kosmos-scw-09371579edf54552b0187a95 -o custom-columns=NAME:.metadata.name,PROVIDER:.metadata.labels.provider

Output

NAME                                             PROVIDERscw-kosmos-kosmos-scw-09371579edf54552b0187a95   scaleway

When we ask for the provider label of our two nodes, we can confirm that our two pods from the deployment (that have the same app=one-per-provider label) were scheduled on different providers.

🔥 Scaling up

Our previous deployment defined two replicas where each generated pod is labeled app=one-per-provider. These pods should be scheduled on nodes with different values for their provider label( topologyKey field).

As our cluster has only two different providers, and all our pods were created within the one-per-provider-deploy deployment, scaling up the deployment should not result in the scheduling of a third pod.

So let's try it by adding only one more replica.

🔥 kubectl scale deployment one-per-provider-deploy --replicas=3

Output
deployment.apps/one-per-provider-deploy scaled

Once the deployment has scaled up, we can list our pods and see what happened.

🔥 kubectl get pods

Output

NAME                                      READY  STATUS   RESTARTS AGEone-per-provider-deploy-7594-29wr7  0/1    Pending  0        7sone-per-provider-deploy-7594-6jb87  1/1    Running  0        12mone-per-provider-deploy-7594-db25x  1/1    Running  0        12m

The third pod is present in our cluster as it was required by the deployment replicaset, but we can see that it is stuck in pending state, meaning it the pod could not find a node to be scheduled on.

The reason behind this behavior is that the pod is waiting for a node to match all of its pod affinity constraints, and there are currently no other nodes from a third Cloud provider in our cluster. Until a new node with this requirement is added to the cluster, our third pod will remain unavailable and in a pending state.

Taints

A taint is a Kubernetes concept used to block pods from running on certain nodes‌‌.

The principle is to define key/value pairs completed by an effect (i.e. a taint policy). There are three possibilities:

  • NoSchedule‌‌: Forbids pod scheduling on the node but allows the execution.
  • PreferNoSchedule‌‌: Allows execution, and forbids pod scheduling on the node, except if no node can match this policy.
  • NoExecute: Forbids pod execution on the node, resulting in the eviction of unauthorized running pods.

Example
user@local:~$ kubectl taint nodes tainted-node key1=value1:NoSchedule

In this example, a taint is applied to the node named tainted-node, and set with the effect (i.e. constraint or policy) NoSchedule.

This means that no pod has permission to be scheduled on a tainted-node, except for pods with a specific authorization to do so. These authorizations are called tolerations and are covered in the next section of this article.

If a pod without the corresponding toleration is already running on a tainted-node at the time the taint is added (i.e. when the kubectl command above is executed), the pod will not be evicted and will keep running on this node.

However, if the constraint was set to NoExecute, any pods without the corresponding toleration would not be allowed to run on the tainted-node, resulting in its eviction.

Tolerations

As stated before, tolerations are applied on pods to allow exceptions on tainted nodes. Two policies are available:

  • Equal: matches the effect of a node taint exactly.
  • ‌‌Exists: matches the existence of a node taint regardless of its value.

In this example, the pod named busybox is granted permission to be scheduled on a tainted node with two taints:

  • key1=value1:NoSchedule
  • key2 with NoSchedule effect regardless of the value attributed to the taint.
# example-equal.yaml--apiVersion: v1kind: Podmetadata:  name: busyboxspec:  containers:  - name: busybox    image: busybox    command: ["/bin/sh","-c","sleep 3600"]  tolerations:  - key: key1    operator: Equal    value: "value1"    effect: NoSchedule  - key: key2    operator: Exists    effect: NoSchedule

Taints and tolerations can converge to define very specific behaviors for the scheduling and execution of pods.

Forbidding execution

To experiment with taints, we are going to taint our managed Scaleway node with autoscale=true:Noschedule.
As this node is part of a managed pool of our cluster, it benefits from the auto-scaling feature. We want grant permission only to pods that are configured to run on an auto-scalable pool.
We also want to exclude (i.e. evict) all running pods which do not have the toleration from this node.

Let's have a look at our cluster status by listing our pods and the nodes they run on.

🔥 kubectl get pods -o custom-columns=NAME:.metadata.name,NODE:.spec.nodeName

Output

NAME                                       NODEone-per-provider-deploy-7594-29wr7   <none>one-per-provider-deploy-7594-6jb87   scw-kosmos-worldwide-b2dbone-per-provider-deploy-7594-db25x   scw-kosmos-kosmos-scw-0937

We still have the same three pods, two of which are running, and one pending (as no node is attributed to it).

Our managed Scaleway pool has auto-scaling activated, using the preset label autoscale=true. We are, therefore, going to use the same label to forbid scheduling on this specific node.

🔥 kubectl taint nodes scw-kosmos-kosmos-scw-09371579edf54552b0187a95 autoscale=true:NoSchedule

Output
node/scw-kosmos-kosmos-scw-09371579edf54552b0187a95 tainted

See what happened on our cluster after applying the taint, below:

🔥 kubectl get pods -o custom-columns=NAME:.metadata.name,NODE:.spec.nodeName,STATUS:.status.phase

Output

NAME                             NODE                       STATUSone-per-provider-deploy-75-29wr7 <none>                     Pendingone-per-provider-deploy-75-6jb87 scw-kosmos-worldwide-b2db  Runningone-per-provider-deploy-75-db25x scw-kosmos-kosmos-scw-0937 Running

The state of our cluster has not changed. The reason for that is that the taint we added concerned scheduling, and our pods were already scheduled on our nodes at the time the taint was added.

Also, our taint forbid scheduling, but it did not forbid the execution of the pod.

Now, let's add a new taint to the same node. This time, however, we will set its effect to NoExecute.

🔥 kubectl taint nodes scw-kosmos-kosmos-scw-09371579edf54552b0187a95 autoscale=true:NoExecute

Output
node/scw-kosmos-kosmos-scw-09371579edf54552b0187a95 tainted

Once this new taint is applied, we want to observe the behavior of our pods.

🔥 kubectl get pods -o custom-columns=NAME:.metadata.name,NODE:.spec.nodeName,STATUS:.status.phase

Output

NAME                              NODE                       STATUSone-per-provider-deploy-75-29wr7  <none>                     Pendingone-per-provider-deploy-75-6jb87  scw-kosmos-worldwide-b2db  Runningone-per-provider-deploy-75-vjxkq  scw-kosmos-worldwide-5ecd  Running

If we look closely at the node column of this output, we can see that node scw-kosmos-kosmos-scw-0937 no longer has a pod running on it. This pod was evicted when the taint with NoExecute effect was applied.

To maintain the stability of our cluster, the replicaset of our one-per-provider-deploy deployment rescheduled a new pod on a node without the incompatible taints.

A new pod was created in pending state while the evicted pod was in a Terminating status. The pod was rescheduled to a node that matched the taints' conditions (the "do not execute on node with the autoscaling" label, but with the condition of being on a different provider using the node selector provider label).

Moving on to tolerations, we will create a pod with a toleration which allows it to schedule on our managed Scaleway node based on its location label [topology.kubernetes.io/region=nl-ams](http://topology.kubernetes.io/region=nl-ams)(this label was setup directly by the Scaleway Kubernetes engine during the managed pool creation).

🔥

#toleration.yaml---apiVersion: v1kind: Podmetadata:  name: busytolerantspec:  containers:  - name: busytolerant    image: busybox    command: ["/bin/sh","-c","sleep 3600"]  tolerations:  - key: autoscale    operator: Equal    value: "true"    effect: NoSchedule  - key: autoscale    operator: Equal    value: "true"    effect: NoExecute  nodeSelector:    topology.kubernetes.io/region: "nl-ams"

This yaml file defines a pod able to run on a node with the following conditions:

  • node has the taint autoscale=true:NoSchedule
  • node has the taint autoscale=true:NoExecute
  • node has the label topology.kubernetes.io/region=nl-ams

Let's apply this configuration to our cluster and observe what happens.

🔥 kubectl apply -f toleration.yaml

Output
pod/busytolerant created

🔥 kubectl get pods -o custom-columns=NAME:.metadata.name,NODE:.spec.nodeName,STATUS:.status.phase

Output

NAME                               NODE                        STATUSbusytolerant                       scw-kosmos-kosmos-scw-0937  Runningone-per-provider-deploy-75-29wr7   <none>                      Pendingone-per-provider-deploy-75-6jb87   scw-kosmos-worldwide-b2db   Runningone-per-provider-deploy-75-vjxkq   scw-kosmos-worldwide-5ecd   Running

We can see that with the right tolerations, the pod named busytolerant was perfectly able to be scheduled and executed on the node we tainted previously.

The addition of the constraint on the region label is just a way to show how all the workload distribution features Kubernetes offers are cumulative.


🔥 Removing the taints before moving forward

To avoid scheduling issues while moving forward in this Hands-On, it is best to remove the taints applied on our node. The command to do so is the same as the one to add the taint, with just the addition of the - (dash) character at the end of the taint declaration .

🔥 kubectl taint nodes scw-kosmos-kosmos-scw-09371579edf54552b0187a95 autoscale=true:NoSchedule-

Output
node/scw-kosmos-kosmos-scw-09371579edf54552b0187a95 untainted

🔥 kubectl taint nodes scw-kosmos-kosmos-scw-09371579edf54552b0187a95 autoscale=true:NoExecute-

Output
node/scw-kosmos-kosmos-scw-09371579edf54552b0187a95 untainted

We can observe that removing the taints did not have an effect on the pods running in our cluster. This happens because tolerations are rules for authorization and not forbidding instructions (by opposition to taints).

🔥 kubectl get pods -o custom-columns=NAME:.metadata.name,NODE:.spec.nodeName,STATUS:.status.phase

Output

NAME                             NODE                        STATUSbusytolerant                     scw-kosmos-kosmos-scw-0937  Runningone-per-provider-deploy-75-29wr7 <none                     Pendingone-per-provider-deploy-75-6jb87 scw-kosmos-worldwide-b2db   Runningone-per-provider-deploy-75-vjxkq scw-kosmos-worldwide-5ecd   Running

Let's keep cleaning our environment and remove our busytolerant pod and our one-per-provider-deploy deployment one by one.

🔥 kubectl delete pods busytolerant

Output
pod "busytolerant" deleted

🔥 kubectl delete deployment one-per-provider-deploy

Output
deployment.apps "one-per-provider-deploy" deleted

🔥 kubectl get all

Output

NAME                TYPE       CLUSTER-IP  EXTERNAL-IP PORT(S)  AGEservice/kubernetes  ClusterIP  10.32.0.1   <none>      443/TCP  107m

PodTopologySpread constraint

The pod topology spread constraint aims to evenly distribute pods across nodes based on specific rules and constraints.

It allows to set a maximum difference of a number of similar pods between the nodes (maxSkew parameter) and to determine the action that should be performed if the constraint cannot be met:

  • DoNotSchedule: hard constraint, the pod cannot be scheduled
  • ScheduleAnyway: soft constraint, the pod can be scheduled if the conditions are not matched.

The sample file below shows the type of configuration to apply a topologySpreadConstraint on pods created from a deployment.

# example.yaml---    apiVersion: apps/v1    kind: Deployment    metadata:      name: busy-topologyspread    spec:      replicas: 10      selector:        matchLabels:          app: busybox-acrossproviders      template:        metadata:          labels:            app: busybox-acrossproviders        spec:          topologySpreadConstraints:          - maxSkew: 1            topologyKey: provider            whenUnsatisfiable: DoNotSchedule            labelSelector:              matchLabels:                app: busybox-acrossproviders          containers:          - name: busybox-everywhere            image: busybox            command: ["/bin/sh","-c","sleep 3600"]

🔥 Distributing our workload

The topology spread constraint is specifically useful to spread the workload of one or multiple applications evenly throughout a Kubernetes cluster.

🔥 We are going to define a spread.yaml file to setup a deployment with ten replicas, but which should be scheduled evenly between nodes with the following labels: provider=scaleway and provider=hetzner.

We are authorize a difference of only one pod between our matching nodes using the maxSkew parameter:

🔥

#spread.yaml---apiVersion: apps/v1kind: Deploymentmetadata:  name: busyspreadspec:  replicas: 10  selector:    matchLabels:      app: busyspread-providers  template:    metadata:      labels:        app: busyspread-providers    spec:      topologySpreadConstraints:      - maxSkew: 1        topologyKey: provider        whenUnsatisfiable: DoNotSchedule        labelSelector:          matchLabels:            app: busyspread-providers      containers:      - name: busyspread        image: busybox        command: ["/bin/sh","-c","sleep 3600"]

🔥 Let's apply this deployment.

🔥 kubectl apply -f spread.yaml

Output
deployment.apps/busyspread created

To see the distribution of our pods across the nodes of our cluster, we are going to list our busyspread pods, the nodes they run on, and count the number of occurences.

🔥 kubectl get pods -o wide --no-headers | grep busyspread | awk '{print $7}' | sort | uniq -c

Output
2 scw-kosmos-kosmos-scw-09371579edf54552b0187a95
3 scw-kosmos-worldwide-5ecdb6d02cf84d63937af45a6
5 scw-kosmos-worldwide-b2db708b0c474decb7447e0d6

Knowing that the two first nodes in this list have the label provider=scaleway and the third one has a label provider=hetzner, we have indeed an even distribution of our workload across our providers with five pods for each of them.

The next step of this Hands-On will be to set up Load Balancing and Storage management within a Multi-Cloud Kubernetes cluster.

🔥 In order to avoid getting mixed up in all our pods and deployments, we are going to clean our enviromnent by deleting our busyspread deployment.

🔥 kubectl delete deployment busyspread

Output
deployment.apps "busyspread" deleted

Recommended articles

Multi-Cloud Kubernetes best practices

Using Kubernetes in a Multi-Cloud environment can be challenging and requires the implementation of best practices. Learn a few good practices to implement a concrete multi-cloud strategy.

Multi-cloudBest practicesKubernetesMulti-Cloud