NavigationContentFooter
Jump toSuggest an edit

How to use the NVIDIA GPU operator on Kapsule and Kosmos with GPU Instances

Reviewed on 03 December 2024Published on 18 July 2023

Kubernetes Kapsule and Kosmos support NVIDIA’s official Kubernetes operator for all GPU pools. This operator is compatible with RENDER-S, GPU-3070-S, H100 PCIe, L40s and L4 offers.

The GPU operator is set up for all GPU pools created in Kubernetes Kapsule and Kosmos, providing automated installation of all required software on GPU worker nodes, such as the device plugin, container toolkit, GPU drivers etc. For more information, refer to the GPU operator overview.

Before you start

To complete the actions presented below, you must have:

  • A Scaleway account logged into the console
  • Owner status or IAM permissions allowing you to perform actions in the intended Organization
  • Created a Kubernetes Kapsule or Kosmos cluster

How to get the GPU operator for a new pool?

Scaleway uses Helm to automate the deployment of the GPU operator in your GPU node pools. It is installed by default on every GPU pools.

  1. Click Kubernetes in the Containers section of the side menu. The Kubernetes creation page displays.
  2. Select the cluster you want to add a pool to.
  3. Click the Pools tab.
  4. Click the + Add pool button. The pool creation wizard displays.
  5. If you are using a Kosmos cluster, you can optionally choose a pool type. Select a Scaleway Kubernetes Kapsule pool.
  6. Choose the zone in which your pool will be deployed.
  7. Click the GPU tab and select the GPU Instance you want to add.
  8. Configure the pool options for your pool.
  9. Click Add pool to deploy the pool. The GPU operator displays in the Easy Deploy tab of your pool and your kube-system namespace.

How to activate the GPU operator on existing node pools

Replace the existing nodes of your pool to deploy the GPU operator on your existing pools.

Important

The GPU Operator installs the drivers shortly after node creation.

Note that if your workload immediately schedules on it, it will miss essential components. Preferably, add a Kubernetes selector on your workload.

spec:
nodeSelector:
nvidia.com/gpu.present: true

or specific hardware requirements

spec:
containers:
- name: gpu-workload
image: "rg.fr-par.scw.cloud/my-namespace/gpu-image:v1.0"
resources:
limits:
nvidia.com/gpu: 1

How to edit the configuration of the GPU operator

The GPU operator on your Scaleway node pools is fully configurable through the Easy Deploy feature, directly from the Scaleway console, or by using helm.

  1. Click Kubernetes in the Containers section of the side menu. The Kubernetes creation page displays.
  2. Select the cluster you want to configure.
  3. Click the Easy Deploy tab.
  4. Click «See more Icon» > Edit next to the GPU operator deployment. A pop-up displays.
  5. Edit the YAML configuration of the deployment to match your desired configuration.
    Tip

    Refer to the offical NVIDIA documentation for a list of available Helm configuration options.

  6. Click Update and deploy to update and deploy the configuration of the GPU operator.
See also
How to upgrade the Kubernetes version on a Kapsule clusterHow to use the scratch storage on H100 GPU Instances with Kapsule
API DocsScaleway consoleDedibox consoleScaleway LearningScaleway.comPricingBlogCareers
© 2023-2024 – Scaleway