Best practices on service exposure and data persistence for a Multi-Cloud Kubernetes cluster
When working with Kubernetes, a specific Kubernetes component manages the creation, configuration, and lifecycle of Load Balancers within a cluster.
This article will guide you through the best practices to deploy and distribute the workload on a Kubernetes environment on Scaleway's Kosmos.
⚠️ Warning reminder
This article will balance between concept explanations and operations or commands that need to be performed by the reader.
If this icon (🔥) is present before an image, a command, or a file, you are required to perform an action.
So remember, when 🔥 is on, so are you!
First, we are going to start by listing your nodes, and more specifically their associated labels. The kubectl get nodes --show-labels
command will perform this action for us.
🔥 kubectl get nodes --show-labels --no-headers | awk '{print "NODE NAME: "$1","$6"\n"}' | tr "," "\n"
Output
NODE NAME: scw-kosmos-kosmos-scw-09371579edf54552b0187a95beta.kubernetes.io/arch=amd64beta.kubernetes.io/instance-type=DEV1-Mbeta.kubernetes.io/os=linuxfailure-domain.beta.kubernetes.io/region=nl-amsfailure-domain.beta.kubernetes.io/zone=nl-ams-1k8s.scaleway.com/kapsule=b58ad1f6-2a4d-4c0b-8573-459fad62682fk8s.scaleway.com/managed=truek8s.scaleway.com/node=09371579-edf5-4552-b018-7a95e779b70ek8s.scaleway.com/pool-name=kosmos-scwk8s.scaleway.com/pool=313ccb19-0233-4dc9-b582-b1e687903b7ak8s.scaleway.com/runtime=containerdkubernetes.io/arch=amd64kubernetes.io/hostname=scw-kosmos-kosmos-scw-09371579edf54552b0187a95kubernetes.io/os=linuxnode.kubernetes.io/instance-type=DEV1-Mtopology.csi.scaleway.com/zone=nl-ams-1topology.kubernetes.io/region=nl-amstopology.kubernetes.io/zone=nl-ams-1 NODE NAME: scw-kosmos-worldwide-5ecdb6d02cf84d63937af45a6beta.kubernetes.io/arch=amd64beta.kubernetes.io/os=linuxk8s.scw.cloud/disable-lifecycle=truek8s.scw.cloud/node-public-ip=151.115.36.196kubernetes.io/arch=amd64kubernetes.io/hostname=scw-kosmos-worldwide-5ecdb6d02cf84d63937af45a6kubernetes.io/os=linuxtopology.kubernetes.io/region=scw-kosmos-worldwide-5ecdb6d02cf84d63937af45a6 NODE NAME: scw-kosmos-worldwide-b2db708b0c474decb7447e0d6beta.kubernetes.io/arch=amd64beta.kubernetes.io/os=linuxk8s.scw.cloud/disable-lifecycle=truek8s.scw.cloud/node-public-ip=65.21.146.191kubernetes.io/arch=amd64kubernetes.io/hostname=scw-kosmos-worldwide-b2db708b0c474decb7447e0d6kubernetes.io/os=linuxtopology.kubernetes.io/region=scw-kosmos-worldwide-b2db708b0c474decb7447e0d6
For each of our three nodes, we see many labels. The first node on the list has considerably more labels as it is managed by a Kubernetes Kosmos engine. In this case, more information about features and node management is added.
As it might not be easy to remember which node comes from which provider, and as it can help us distribute our workload across providers, we are going to label our nodes with a label called provider
with values such as scaleway
or hetzner
.
kubectl label nodes scw-kosmos-kosmos-scw-09371579edf54552b0187a95 provider=scaleway
kubectl label nodes scw-kosmos-worldwide-5ecdb6d02cf84d63937af45a6 provider=scaleway
kubectl label nodes scw-kosmos-worldwide-b2db708b0c474decb7447e0d6 provider=hetzner
In addition, we are also going to add label to our unmanaged Scaleway node to specify that it is, in fact, not managed by the engine. For that we use the same label used on the managed Scaleway node, but set to false: k8s.scaleway.com/managed=false
.
kubectl label nodes scw-kosmos-worldwide-5ecdb6d02cf84d63937af45a6 k8s.scaleway.com/managed=false
Let's list our labels to ensure that the provider
label is well set on our three nodes.
🔥 kubectl get nodes --show-labels --no-headers | awk '{print "NODE NAME: "$1","$6"\n"}' | tr "," "\n"
Output
NODE NAME: scw-kosmos-kosmos-scw-09371579edf54552b0187a95beta.kubernetes.io/arch=amd64beta.kubernetes.io/instance-type=DEV1-Mbeta.kubernetes.io/os=linuxfailure-domain.beta.kubernetes.io/region=nl-amsfailure-domain.beta.kubernetes.io/zone=nl-ams-1k8s.scaleway.com/kapsule=b58ad1f6-2a4d-4c0b-8573-459fad62682fk8s.scaleway.com/managed=truek8s.scaleway.com/node=09371579-edf5-4552-b018-7a95e779b70ek8s.scaleway.com/pool-name=kosmos-scwk8s.scaleway.com/pool=313ccb19-0233-4dc9-b582-b1e687903b7ak8s.scaleway.com/runtime=containerdkubernetes.io/arch=amd64kubernetes.io/hostname=scw-kosmos-kosmos-scw-09371579edf54552b0187a95kubernetes.io/os=linuxnode.kubernetes.io/instance-type=DEV1-Mprovider=scalewaytopology.csi.scaleway.com/zone=nl-ams-1topology.kubernetes.io/region=nl-amstopology.kubernetes.io/zone=nl-ams-1NODE NAME: scw-kosmos-worldwide-5ecdb6d02cf84d63937af45a6beta.kubernetes.io/arch=amd64beta.kubernetes.io/os=linuxk8s.scaleway.com/managed=falsek8s.scw.cloud/disable-lifecycle=truek8s.scw.cloud/node-public-ip=151.115.36.196kubernetes.io/arch=amd64kubernetes.io/hostname=scw-kosmos-worldwide-5ecdb6d02cf84d63937af45a6kubernetes.io/os=linuxprovider=scalewaytopology.kubernetes.io/region=scw-kosmos-worldwide-5ecdb6d02cf84d63937af45a6NODE NAME: scw-kosmos-worldwide-b2db708b0c474decb7447e0d6beta.kubernetes.io/arch=amd64beta.kubernetes.io/os=linuxk8s.scw.cloud/disable-lifecycle=truek8s.scw.cloud/node-public-ip=65.21.146.191kubernetes.io/arch=amd64kubernetes.io/hostname=scw-kosmos-worldwide-b2db708b0c474decb7447e0d6kubernetes.io/os=linuxprovider=hetznertopology.kubernetes.io/region=scw-kosmos-worldwide-b2db708b0c474decb7447e0d6
To better understand the behavior of a Multi-Cloud Kubernetes cluster, we are going to create a very simple deployment using kubectl
. This deployment will run three replicas of the busybox
image, each of which will print the date every ten seconds.
kubectl create deploy first-deployment --replicas=3 --image=busybox -- /bin/sh -c "while true; do date; sleep 10; done"
Once the deployment has been created, we can observe what is actually happening on our cluster.
kubectl get all
Output
NAME READY STATUS RESTARTS AGEpod/first-deployment-695f579bd4-cfg6l 1/1 Running 0 8spod/first-deployment-695f579bd4-jzft8 1/1 Running 0 8spod/first-deployment-695f579bd4-rt5jt 1/1 Running 0 8sNAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGEservice/kubernetes ClusterIP 10.32.0.1 <none 443/TCP 53mNAME READY UP-TO-DATE AVAILABLE AGEdeployment.apps/first-deployment 3/3 3 3 8sNAME DESIRED CURRENT READY AGEreplicaset.apps/first-deployment-695f579bd4 3 3 3 8s
Our first observation is that our deployment
object is here, along with the three pods
(replicas) we asked for. We can also observe that another "unexpected" object was also created, a replicaset
. The replicaset
is an intermediary object created by the deployment
in charge of maintaining and monitoring the replicas.
Now, let's have a quick look inside one of our pods
to see if it performs normally.
🔥 kubectl logs pod/first-deployment-695f579bd4-cfg6l
Output
Mon Sep 6 08:41:01 UTC 2021Mon Sep 6 08:41:11 UTC 2021Mon Sep 6 08:41:21 UTC 2021Mon Sep 6 08:41:31 UTC 2021
We can see that our pod is writing the date every ten seconds, which is exactly what we asked it to do.
Now, the real question is, where are these pods
running? We can use the kubectl get pods
to give us the name of the node where they actually run.
🔥 kubectl get pods -o custom-columns=NAME:.metadata.name,NODE:.spec.nodeName
Output
NAME NODEfirst-deployment-695f579bd4-cfg6l scw-kosmos-kosmos-scw-0937first-deployment-695f579bd4-jzft8 scw-kosmos-kosmos-scw-0937first-deployment-695f579bd4-rt5jt scw-kosmos-kosmos-scw-0937
When listing our three pods and their location, it seems that they all run on the same managed Scaleway node (the one located in Amsterdam). That's unfortunate... Let's see if we can act on this behavior.
The first thing we can try is to scale up our deployment and see where all our new replicas will be scheduled.
🔥 kubectl scale deployment first-deployment --replicas=15
The scaling has been applied, we can list our pods again.
🔥 kubectl get pods -o custom-columns=NAME:.metadata.name,NODE:.spec.nodeName
Output
NAME NODEfirst-deployment-695f579bd4-5jq9q scw-kosmos-kosmos-scw-0937first-deployment-695f579bd4-5t6tw scw-kosmos-kosmos-scw-0937first-deployment-695f579bd4-5twcj scw-kosmos-kosmos-scw-0937first-deployment-695f579bd4-5xljr scw-kosmos-kosmos-scw-0937first-deployment-695f579bd4-8phq5 scw-kosmos-kosmos-scw-0937first-deployment-695f579bd4-cfg6l scw-kosmos-kosmos-scw-0937first-deployment-695f579bd4-jzft8 scw-kosmos-kosmos-scw-0937first-deployment-695f579bd4-nf9fg scw-kosmos-kosmos-scw-0937first-deployment-695f579bd4-nsxb6 scw-kosmos-kosmos-scw-0937first-deployment-695f579bd4-ptlkp scw-kosmos-kosmos-scw-0937first-deployment-695f579bd4-rgdqj scw-kosmos-kosmos-scw-0937first-deployment-695f579bd4-rt5jt scw-kosmos-kosmos-scw-0937first-deployment-695f579bd4-vrl95 scw-kosmos-kosmos-scw-0937first-deployment-695f579bd4-vwv7l scw-kosmos-kosmos-scw-0937first-deployment-695f579bd4-w9qqq scw-kosmos-kosmos-scw-0937
And they are still all running on the same node.
To go further, we are going to play with more complex configuration. In order to do so without getting mixed up with our other configurations, deployments and pods, it is best to clean our environment and delete our deployment.
🔥 kubectl delete deployment first-deployment
Output
deployment.apps "first-deployment" deleted
Kubectl commands are nice, but when it comes to managing multiple Kubernetes objects, configuration files are a better and more reliable fit. In Kubernetes, configurations are made in yaml
format, always following a pattern similar to the one below:
#example.yaml-—-apiVersion: apps/v1 # version of the k8s apikind: Pod # type of the Kubernetes object we aim to describemetadata: # additional options such as the object name, labels, annotations …spec: # parameters and options of the k8s object to create …
In Kubernetes, there are different options available to distribute our workload across nodes, namespaces, or depending on affinity, between pods
. Working in a Multi-Cloud Kubernetes environmnent makes their usage mandatory and knowing them and their behavior can rapidly become crucial.
A node selector is applied on a pod and will match labels that exist on the cluster nodes. The command below gives us all information about a given node, including labels, annotations, running pods, etc...
🔥 kubectl describe node scw-kosmos-kosmos-scw-09371579edf54552b0187a95
Output
Name: scw-kosmos-kosmos-scw-09371579edf54552b0187a95Roles: <none>Labels: beta.kubernetes.io/arch=amd64 beta.kubernetes.io/instance-type=DEV1-M beta.kubernetes.io/os=linux failure-domain.beta.kubernetes.io/region=nl-ams failure-domain.beta.kubernetes.io/zone=nl-ams-1 k8s.scaleway.com/kapsule=b58a[...] k8s.scaleway.com/managed=true k8s.scaleway.com/node=0937[...] k8s.scaleway.com/pool=313c[...] k8s.scaleway.com/pool-name=kosmos-scw k8s.scaleway.com/runtime=containerd kubernetes.io/arch=amd64 kubernetes.io/hostname=scw-kosmos-kosmos-scw-0937[...] kubernetes.io/os=linux node.kubernetes.io/instance-type=DEV1-M provider=scaleway topology.csi.scaleway.com/zone=nl-ams-1 topology.kubernetes.io/region=nl-ams topology.kubernetes.io/zone=nl-ams-1Annotations: csi.volume.kubernetes.io/nodeid: {"csi.scaleway.com":"[...]"} kilo.squat.ai/discovered-endpoints: {} kilo.squat.ai/endpoint: 51.15.123.156:51820 kilo.squat.ai/force-endpoint: 51.15.123.156:51820 kilo.squat.ai/granularity: location kilo.squat.ai/internal-ip: 10.67.36.37/31 kilo.squat.ai/key: cSP2[...] kilo.squat.ai/last-seen: 1630917821 kilo.squat.ai/wireguard-ip: 10.4.0.1/16 node.alpha.kubernetes.io/ttl: 0 volumes.kubernetes.io/controller-managed-attach-detach: trueCreationTimestamp: Mon, 06 Sep 2021 09:50:02 +0200Taints: <none>[...]
Parts in brackets [...] are truncation of the output for more lisibility
A node selector
can be applied on a pod using an existing label
, or a new label
created by the Kubernetes user, such as the labels we previously added on our nodes.
This is a sample yaml
file using a node selector
in a pod
:
#example.yaml-—-apiVersion: apps/v1kind: Podmetadata: name: nginxspec: containers: - name: nginx image: nginx nodeSelector: provider: scaleway
The node selector
ensures that the defined pod will only be scheduled on a node matching the condition. In this example, the nginx
pod will only be scheduled on nodes with the label provider=scaleway
.
Node affinity also matches labels existing on nodes, but provides more flexibility and option in terms of the rules that are applied.
First of all, the node affinity
accepts two different policies:
As their names are self explanatory, we can easily understand that if a condition is not matched, Kubernetes might still be able to schedule pods
on nodes that do not match the conditions. It allows the definitions of preferences for pod
scheduling instead of mandatory criterions.
The file here is an example of requiredDuringSchedulingIgnoredDuringExecution
and preferredDuringSchedulingIgnoredDuringExecution
configuration.
#example.yaml-—-apiVersion: apps/v1kind: Podmetadata: name: nginxspec: affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: provider operator: In values: - scaleway - hetzner preferedDuringSchedulingIgnoredDuringExecution: - weight: 1 preference: - matchExpressions: - key: topology.kubernetes.io/region operator: In values: - nl-ams Containers: - name: nginx image: nginx
In this example, the pod
is required to run on a node with provider=scaleway
or provider=hetzner
label, and should preferably be scheduled on a node with a label topology.kubernetes.io/region=nl-ams
.
The pod affinity
constraint is applied on pods
based on other pods
labels. It benefits from the two same policies as the node affinity
:
The difference with node affinity
is that instead of defining rules for the cohabitation of pods on nodes, the pod affinity
defines rules between pods, such as "pod 1
should run on the same node as pod 2
".
In the following sample file, we specify that an nginx
pod must be scheduled on any nodes containing a pod with the label app=one-per-provider
.
#example.yaml-—-apiVersion: apps/v1kind: Podmetadata: name: nginxspec: affinity: podAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - key: app operator: In values: - one-per-provider topologyKey: provider containers: - name: nginx image: nginx
The same way Kubernetes allows us to define pod affinities
, we can also define pod anti affinities
, thus defining preferences for pods
to not cohabitate together under some conditions.
The same two policies are available:
In the following sample file, we define that nginx
pods
should ideally not be scheduled on pods
with the security=S1
label and on a node with a different value for topology.kubernetes.io/zone
labels.
#example.yaml-—-apiVersion: apps/v1kind: Podmetadata: name: nginxspec: affinity: podAntiAffinity: preferedDuringSchedulingIgnoredDuringExecution: - weight: 100 podAffinityTerm: labelSelector: matchExpressions: - key: security operator: In values: - S1 topologyKey: topology.kubernetes.io/zone containers: - name: nginx image: nginx
We are going to try out deploying an application across our two providers: Scaleway and Hetzner.
🔥 Let's create this antiaffinity.yaml
configuration file to create a deployment
where each pod
will be deployed on nodes with a different provider
label, and will not cohabitate with pods
that have the app=one-per-provider
label.
🔥
#antiaffinity.yaml---apiVersion: apps/v1kind: Deploymentmetadata: name: one-per-provider-deployspec: replicas: 2 selector: matchLabels: app: one-per-provider template: metadata: labels: app: one-per-provider spec: affinity: podAntiAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - key: app operator: In values: - one-per-provider topologyKey: provider containers: - name: busytime image: busybox command: ["/bin/sh","-c","while true; do date; sleep 10; done"]
🔥 Apply the deployment
configuration on our cluster using the following command.
🔥 kubectl apply -f antiaffinity.yaml
Output
deployment.apps/one-per-provider-deploy created
And observe pods
that were generated and the nodes
they run on.
🔥 kubectl get pods -o custom-columns=NAME:.metadata.name,NODE:.spec.nodeName
Output
NAME NODEone-per-provider-deploy-75945bb589-6jb87 scw-kosmos-worldwide-b2dbone-per-provider-deploy-75945bb589-db25x scw-kosmos-kosmos-scw-0937
By looking at the name of the different nodes, we can see the pool
name in the middle, informing that our two pods
are deployed on different pools
, and thus different nodes
.
Since we have a Scaleway node in our "worldwide" pool
, let's make sure that both our instances are on different providers. This information can be found in the deployment
configuration, by fetching the provider
label
we set on it at the beginning of the workshop.
Furthermore, one of the nodes in the "worldwide" pool
is a Scaleway node, so we want to make sure that both our pods don't actually run on Scaleway.
🔥 kubectl get nodes scw-kosmos-worldwide-b2db708b0c474decb7447e0d6 -o custom-columns=NAME:.metadata.name,PROVIDER:.metadata.labels.provider
Output
NAME PROVIDERscw-kosmos-worldwide-b2db708b0c474decb7447e0d6 hetzner
🔥 kubectl get nodes scw-kosmos-kosmos-scw-09371579edf54552b0187a95 -o custom-columns=NAME:.metadata.name,PROVIDER:.metadata.labels.provider
Output
NAME PROVIDERscw-kosmos-kosmos-scw-09371579edf54552b0187a95 scaleway
When we ask for the provider
label
of our two nodes, we can confirm that our two pods
from the deployment (that have the same app=one-per-provider
label
) were scheduled on different providers.
Our previous deployment
defined two replicas
where each generated pod
is labeled app=one-per-provider
. These pods
should be scheduled on nodes
with different values for their provider label
( topologyKey
field).
As our cluster has only two different providers, and all our pods
were created within the one-per-provider-deploy deployment
, scaling up the deployment
should not result in the scheduling of a third pod
.
So let's try it by adding only one more replica
.
🔥 kubectl scale deployment one-per-provider-deploy --replicas=3
Output
deployment.apps/one-per-provider-deploy scaled
Once the deployment
has scaled up, we can list our pods
and see what happened.
🔥 kubectl get pods
Output
NAME READY STATUS RESTARTS AGEone-per-provider-deploy-7594-29wr7 0/1 Pending 0 7sone-per-provider-deploy-7594-6jb87 1/1 Running 0 12mone-per-provider-deploy-7594-db25x 1/1 Running 0 12m
The third pod
is present in our cluster as it was required by the deployment replicaset
, but we can see that it is stuck in pending
state, meaning it the pod
could not find a node
to be scheduled on.
The reason behind this behavior is that the pod
is waiting for a node
to match all of its pod affinity constraints
, and there are currently no other nodes from a third Cloud provider in our cluster. Until a new node
with this requirement is added to the cluster, our third pod
will remain unavailable and in a pending
state.
A taint
is a Kubernetes concept used to block pods
from running on certain nodes.
The principle is to define key/value pairs completed by an effect
(i.e. a taint policy). There are three possibilities:
pod
scheduling on the node
but allows the execution.pod
scheduling on the node
, except if no node
can match this policy.pod
execution on the node
, resulting in the eviction of unauthorized running pods
.Example
user@local:~$ kubectl taint nodes tainted-node key1=value1:NoSchedule
In this example, a taint
is applied to the node named tainted-node
, and set with the effect
(i.e. constraint or policy) NoSchedule
.
This means that no pod
has permission to be scheduled on a tainted-node
, except for pods
with a specific authorization to do so. These authorizations are called tolerations
and are covered in the next section of this article.
If a pod
without the corresponding toleration
is already running on a tainted-node
at the time the taint
is added (i.e. when the kubectl
command above is executed), the pod
will not be evicted and will keep running on this node.
However, if the constraint was set to NoExecute
, any pods
without the corresponding toleration
would not be allowed to run on the tainted-node
, resulting in its eviction.
As stated before, tolerations
are applied on pods
to allow exceptions on tainted nodes. Two policies are available:
effect
of a node taint
exactly.node taint
regardless of its value.In this example, the pod
named busybox
is granted permission to be scheduled on a tainted node with two taints
:
key1=value1:NoSchedule
key2
with NoSchedule effect
regardless of the value
attributed to the taint
.# example-equal.yaml--—apiVersion: v1kind: Podmetadata: name: busyboxspec: containers: - name: busybox image: busybox command: ["/bin/sh","-c","sleep 3600"] tolerations: - key: key1 operator: Equal value: "value1" effect: NoSchedule - key: key2 operator: Exists effect: NoSchedule
Taints
and tolerations
can converge to define very specific behaviors for the scheduling and execution of pods
.
To experiment with taints
, we are going to taint our managed Scaleway node with autoscale=true:Noschedule
.
As this node is part of a managed pool of our cluster, it benefits from the auto-scaling feature. We want grant permission only to pods
that are configured to run on an auto-scalable pool.
We also want to exclude (i.e. evict) all running pods which do not have the toleration from this node.
Let's have a look at our cluster status by listing our pods
and the nodes
they run on.
🔥 kubectl get pods -o custom-columns=NAME:.metadata.name,NODE:.spec.nodeName
Output
NAME NODEone-per-provider-deploy-7594-29wr7 <none>one-per-provider-deploy-7594-6jb87 scw-kosmos-worldwide-b2dbone-per-provider-deploy-7594-db25x scw-kosmos-kosmos-scw-0937
We still have the same three pods
, two of which are running
, and one pending
(as no node is attributed to it).
Our managed Scaleway pool
has auto-scaling activated, using the preset label
autoscale=true
. We are, therefore, going to use the same label to forbid scheduling on this specific node.
🔥 kubectl taint nodes scw-kosmos-kosmos-scw-09371579edf54552b0187a95 autoscale=true:NoSchedule
Output
node/scw-kosmos-kosmos-scw-09371579edf54552b0187a95 tainted
See what happened on our cluster after applying the taint
, below:
🔥 kubectl get pods -o custom-columns=NAME:.metadata.name,NODE:.spec.nodeName,STATUS:.status.phase
Output
NAME NODE STATUSone-per-provider-deploy-75-29wr7 <none> Pendingone-per-provider-deploy-75-6jb87 scw-kosmos-worldwide-b2db Runningone-per-provider-deploy-75-db25x scw-kosmos-kosmos-scw-0937 Running
The state of our cluster has not changed. The reason for that is that the taint
we added concerned scheduling, and our pods
were already scheduled on our nodes
at the time the taint
was added.
Also, our taint
forbid scheduling, but it did not forbid the execution of the pod
.
Now, let's add a new taint to the same node. This time, however, we will set its effect to NoExecute
.
🔥 kubectl taint nodes scw-kosmos-kosmos-scw-09371579edf54552b0187a95 autoscale=true:NoExecute
Output
node/scw-kosmos-kosmos-scw-09371579edf54552b0187a95 tainted
Once this new taint
is applied, we want to observe the behavior of our pods
.
🔥 kubectl get pods -o custom-columns=NAME:.metadata.name,NODE:.spec.nodeName,STATUS:.status.phase
Output
NAME NODE STATUSone-per-provider-deploy-75-29wr7 <none> Pendingone-per-provider-deploy-75-6jb87 scw-kosmos-worldwide-b2db Runningone-per-provider-deploy-75-vjxkq scw-kosmos-worldwide-5ecd Running
If we look closely at the node column of this output, we can see that node scw-kosmos-kosmos-scw-0937
no longer has a pod
running on it. This pod
was evicted when the taint
with NoExecute
effect was applied.
To maintain the stability of our cluster, the replicaset
of our one-per-provider-deploy deployment
rescheduled a new pod
on a node without the incompatible taints
.
A new pod
was created in pending state while the evicted pod
was in a Terminating
status. The pod
was rescheduled to a node
that matched the taints
' conditions (the "do not execute on node
with the autoscaling" label
, but with the condition of being on a different provider using the node selector
provider label
).
Moving on to tolerations
, we will create a pod
with a toleration
which allows it to schedule on our managed Scaleway node based on its location label [topology.kubernetes.io/region=nl-ams](http://topology.kubernetes.io/region=nl-ams)
(this label was setup directly by the Scaleway Kubernetes engine during the managed pool creation).
🔥
#toleration.yaml---apiVersion: v1kind: Podmetadata: name: busytolerantspec: containers: - name: busytolerant image: busybox command: ["/bin/sh","-c","sleep 3600"] tolerations: - key: autoscale operator: Equal value: "true" effect: NoSchedule - key: autoscale operator: Equal value: "true" effect: NoExecute nodeSelector: topology.kubernetes.io/region: "nl-ams"
This yaml
file defines a pod
able to run on a node with the following conditions:
taint autoscale=true:NoSchedule
taint autoscale=true:NoExecute
label topology.kubernetes.io/region=nl-ams
Let's apply this configuration to our cluster and observe what happens.
🔥 kubectl apply -f toleration.yaml
Output
pod/busytolerant created
🔥 kubectl get pods -o custom-columns=NAME:.metadata.name,NODE:.spec.nodeName,STATUS:.status.phase
Output
NAME NODE STATUSbusytolerant scw-kosmos-kosmos-scw-0937 Runningone-per-provider-deploy-75-29wr7 <none> Pendingone-per-provider-deploy-75-6jb87 scw-kosmos-worldwide-b2db Runningone-per-provider-deploy-75-vjxkq scw-kosmos-worldwide-5ecd Running
We can see that with the right tolerations
, the pod
named busytolerant
was perfectly able to be scheduled and executed on the node we tainted previously.
The addition of the constraint on the region label
is just a way to show how all the workload distribution features Kubernetes offers are cumulative.
To avoid scheduling issues while moving forward in this Hands-On, it is best to remove the taints
applied on our node. The command to do so is the same as the one to add the taint
, with just the addition of the -
(dash) character at the end of the taint
declaration .
🔥 kubectl taint nodes scw-kosmos-kosmos-scw-09371579edf54552b0187a95 autoscale=true:NoSchedule-
Output
node/scw-kosmos-kosmos-scw-09371579edf54552b0187a95 untainted
🔥 kubectl taint nodes scw-kosmos-kosmos-scw-09371579edf54552b0187a95 autoscale=true:NoExecute-
Output
node/scw-kosmos-kosmos-scw-09371579edf54552b0187a95 untainted
We can observe that removing the taints
did not have an effect on the pods running in our cluster. This happens because tolerations
are rules for authorization and not forbidding instructions (by opposition to taints
).
🔥 kubectl get pods -o custom-columns=NAME:.metadata.name,NODE:.spec.nodeName,STATUS:.status.phase
Output
NAME NODE STATUSbusytolerant scw-kosmos-kosmos-scw-0937 Runningone-per-provider-deploy-75-29wr7 <none Pendingone-per-provider-deploy-75-6jb87 scw-kosmos-worldwide-b2db Runningone-per-provider-deploy-75-vjxkq scw-kosmos-worldwide-5ecd Running
Let's keep cleaning our environment and remove our busytolerant pod
and our one-per-provider-deploy deployment
one by one.
🔥 kubectl delete pods busytolerant
Output
pod "busytolerant" deleted
🔥 kubectl delete deployment one-per-provider-deploy
Output
deployment.apps "one-per-provider-deploy" deleted
🔥 kubectl get all
Output
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGEservice/kubernetes ClusterIP 10.32.0.1 <none> 443/TCP 107m
The pod topology spread
constraint aims to evenly distribute pods
across nodes
based on specific rules and constraints.
It allows to set a maximum difference of a number of similar pods
between the nodes (maxSkew
parameter) and to determine the action that should be performed if the constraint cannot be met:
pod
cannot be scheduledpod
can be scheduled if the conditions are not matched.The sample file below shows the type of configuration to apply a topologySpreadConstraint
on pods
created from a deployment
.
# example.yaml--- apiVersion: apps/v1 kind: Deployment metadata: name: busy-topologyspread spec: replicas: 10 selector: matchLabels: app: busybox-acrossproviders template: metadata: labels: app: busybox-acrossproviders spec: topologySpreadConstraints: - maxSkew: 1 topologyKey: provider whenUnsatisfiable: DoNotSchedule labelSelector: matchLabels: app: busybox-acrossproviders containers: - name: busybox-everywhere image: busybox command: ["/bin/sh","-c","sleep 3600"]
The topology spread constraint
is specifically useful to spread the workload of one or multiple applications evenly throughout a Kubernetes cluster.
🔥 We are going to define a spread.yaml
file to setup a deployment with ten replicas, but which should be scheduled evenly between nodes with the following labels
: provider=scaleway
and provider=hetzner
.
We are authorize a difference of only one pod
between our matching nodes using the maxSkew
parameter:
🔥
#spread.yaml---apiVersion: apps/v1kind: Deploymentmetadata: name: busyspreadspec: replicas: 10 selector: matchLabels: app: busyspread-providers template: metadata: labels: app: busyspread-providers spec: topologySpreadConstraints: - maxSkew: 1 topologyKey: provider whenUnsatisfiable: DoNotSchedule labelSelector: matchLabels: app: busyspread-providers containers: - name: busyspread image: busybox command: ["/bin/sh","-c","sleep 3600"]
🔥 Let's apply this deployment
.
🔥 kubectl apply -f spread.yaml
Output
deployment.apps/busyspread created
To see the distribution of our pods across the nodes of our cluster, we are going to list our busyspread pods
, the nodes
they run on, and count the number of occurences.
🔥 kubectl get pods -o wide --no-headers | grep busyspread | awk '{print $7}' | sort | uniq -c
Output
2 scw-kosmos-kosmos-scw-09371579edf54552b0187a95
3 scw-kosmos-worldwide-5ecdb6d02cf84d63937af45a6
5 scw-kosmos-worldwide-b2db708b0c474decb7447e0d6
Knowing that the two first nodes in this list have the label provider=scaleway
and the third one has a label provider=hetzner
, we have indeed an even distribution of our workload across our providers with five pods for each of them.
The next step of this Hands-On will be to set up Load Balancing and Storage management within a Multi-Cloud Kubernetes cluster.
🔥 In order to avoid getting mixed up in all our pods and deployments, we are going to clean our enviromnent by deleting our busyspread deployment
.
🔥 kubectl delete deployment busyspread
Output
deployment.apps "busyspread" deleted
When working with Kubernetes, a specific Kubernetes component manages the creation, configuration, and lifecycle of Load Balancers within a cluster.
Multi cloud is starting to topple the discussion, but it is nothing new: even without using the term “multi cloud”, 92% of companies are already using a multi-cloud approach. So what is it exactly?
Using Kubernetes in a Multi-Cloud environment can be challenging and requires the implementation of best practices. Learn a few good practices to implement a concrete multi-cloud strategy.