It is known that cluster with many nodes and with at least one GPU node may have a lot of nodefeatures objects polluting their etcd space. We are working on a long-term fix but for now manually deleting these objects and downsizing the cluster (or upgrading to a dedicated offer with a bigger etcd quota) is the best solution.
Kubernetes Kapsule clusters have quotas on the space they can occupy on an etcd database. See Kapsule cluster-types for details on each offer.
You can see your current cluster space consumption at any time in your cluster grafana dashboard (Etcd disk usage
panel).
This guide helps you to free up space on your database to avoid reaching this limit.
Before you startLink to this anchor
To complete the actions presented below, you must have:
- Created a Kubernetes Kapsule cluster
- Downloaded the Kubeconfig
-
Looking for unused resources is a good approach, delete any Secrets, large ConfigMaps that are not used anymore in your cluster.
> kubectl -n $namespace delete $ConfigMapName -
keep an eye on Helm Charts that are deploying a lot of custom resources (CRDs), they tend to fill up etcd space. You can find them by showing resource kinds
> kubectl api-resourcesNAME SHORTNAMES APIVERSION NAMESPACED KINDconfigmaps cm v1 true ConfigMapendpoints ep v1 true Endpointsevents ev v1 true Eventcronjobs cj batch/v1 true CronJobjobs batch/v1 true Job[...]
Look for resources with an external apiversion (not v1, apps/v1, storage.k8s.io/v1 or batch/v1 for example).
-
If you have a doubt on space taken by a resource, you can dump it to get its size
> kubectl get nodefeature -n kube-system $node-feature-name -o yaml | wc -c305545 // ~300KiB, big object