Skip to navigationSkip to main contentSkip to footerScaleway DocsSparklesIconAsk our AI
SparklesIconAsk our AI

How to create an Apache Spark™ cluster

Clusters for Apache Spark™ is a product designed to assist data scientists and data engineers in performing calculations on a remotely managed Apache Spark™ infrastructure.

Before you start

To complete the actions presented below, you must have:

  1. Click Apache Spark™ under Data & Analytics on the side menu. The Clusters for Apache Spark™ page displays.

  2. Click Create cluster. The creation wizard displays.

  3. Choose an Apache Spark™ version from the drop-down menu.

  4. Choose a main node type. If you plan to add a notebook to your cluster, select the DATALAB-SHARED-4C-8G configuration to provision sufficient resources for it.

  5. Choose a worker node type based on your hardware requirements. CPUs are suitable for most workloads, while GPUs are best for machine learning and AI models training.

  6. Enter the desired number of worker nodes.

  7. Add a persistent volume if required, then enter a volume size according to your needs.

    InformationOutlineIcon
    Note

    Persistent volume usage depends on your workload, and only the actual usage will be billed, within the limit defined. A minimum of 1 GB is required to run the notebook.

  8. Add a notebook if you want to use an integrated notebook environment to interact with your cluster. Adding a notebook requires 1 GB of billable storage.

  9. Select a Private Network from the drop-down menu to attach to your cluster, or create a new one. Apache Spark™ clusters cannot be used without a Private Network.

  10. Enter a name for your cluster, and add optional tags.

  11. Verify the estimated cost.

  12. Click Create Cluster to finish. You are directed to the cluster overview page.

SearchIcon
No Results