How to recover ETCD database space for a Kapsule/Kosmos cluster

Reviewed on April 01, 2025

Kubernetes Kapsule clusters have quotas on the space they can occupy on an etcd database. See Kapsule cluster-types for details on each offer. You can see your current cluster space consumption at any time in your cluster grafana dashboard (Etcd disk usage panel). This guide helps you to free up space on your database to avoid reaching this limit.

Before you start

To complete the actions presented below, you must have:

Created a Kubernetes Kapsule cluster
Downloaded the Kubeconfig

Dump your cluster resources to YAML format and show the characters count, you will have a rough estimation where to look for space to claim

> kubectl api-resources --verbs=list --namespaced -o name | while read type; do echo -n "Kind: ${type}, Size: "; kubectl get $type -o yaml -A | wc -c; done
Kind: configmaps, Size: 1386841
Kind: endpoints, Size: 82063
Kind: events, Size: 375065
Kind: limitranges, Size: 68
Kind: persistentvolumeclaims, Size: 68
Kind: pods, Size: 3326153
[...]

Looking for unused resources is a good approach, delete any Secrets, large ConfigMaps that are not used anymore in your cluster.
```
> kubectl -n $namespace delete $ConfigMapName
```

keep an eye on Helm Charts that are deploying a lot of custom resources (CRDs), they tend to fill up etcd space. You can find them by showing resource kinds

> kubectl api-resources
NAME                                SHORTNAMES                          APIVERSION                           NAMESPACED   KIND
configmaps                          cm                                  v1                                   true         ConfigMap
endpoints                           ep                                  v1                                   true         Endpoints
events                              ev                                  v1                                   true         Event
cronjobs                            cj                                  batch/v1                             true         CronJob
jobs                                                                    batch/v1                             true         Job
[...]

Look for resources with an external apiversion (not v1, apps/v1, storage.k8s.io/v1 or batch/v1 for example).

Note

It is known that cluster with many nodes and with at least one GPU node may have a lot of nodefeatures objects polluting their etcd space. We are working on a long-term fix but for now manually deleting these objects and downsizing the cluster (or upgrading to a dedicated offer with a bigger etcd quota) is the best solution.

If you have a doubt on space taken by a resource, you can dump it to get its size

> kubectl get nodefeature -n kube-system   $node-feature-name -o yaml | wc -c
  305545 // ~300KiB, big object