Skip to navigationSkip to main contentSkip to footerScaleway DocsSparklesIconAsk our AI
SparklesIconAsk our AI

Clusters for Apache Spark™ - Quickstart

Console overview

Discover the Clusters for Apache Spark™ interface on the Scaleway console (Clusters for Apache Spark™ was formerly known as Data Lab for Apache Spark™).

Clusters for Apache Spark™ is a product designed to assist data engineers and data scientists in performing data processing on a remotely managed Apache Spark™ infrastructure.

This documentation explains how to quickly create an Apache Spark™ cluster, access its notebook environment, run the included demo file, and delete your cluster.

Before you start

To complete the actions presented below, you must have:

How to create an Apache Spark™ cluster

  1. Under Data & Analytics on the side menu, click Apache Spark™.

  2. Click Create cluster. The creation wizard displays.

  3. Complete the following steps in the wizard:

    • Select a region for your cluster.
    • Choose an Apache Spark™ version from the drop-down menu.
    • Select a main node type.
    • Select a CPU worker node configuration.
    • Enter the desired number of worker nodes.
    • Add an optional notebook (JupyterLab).
    • Select an existing Private Network, or create a new one.
    • Enter a name and optional tags for your cluster.
    • Verify the estimated cost.
  4. Click Create Cluster to finish.

Once the cluster is created, you are directed to its Overview page.

Refer to the dedicated documentation for detailed information on how to create a cluster.

How to connect to your cluster's notebook

  1. Click Apache Spark™ under Data & Analytics on the side menu. The Clusters for Apache Spark™ page displays.

  2. Click the name of the cluster you want to connect to. The cluster Overview page displays.

  3. In the Notebook section, click Open notebook. You are directed to the notebook login page.

  4. Enter your API secret key when prompted for a password, then click Log in.

You are directed to the notebook home screen.

How to run the demo file

Each Cluster for Apache Spark™ comes with a default DatalabDemo.ipynb demonstration file for testing purposes. This file contains a preconfigured notebook environment that requires no modification to run.

Execute the cells in order to perform predetermined operations on a dummy data set representative of real life use cases and workloads to assess the performance of your cluster.

CheckCircleOutlineIcon
Tip

The demo file also contains a set of examples to configure and extend your Apache Spark™ configuration.

How to delete an Apache Spark cluster™

AlertCircleIcon
Important

This action is irreversible and will permanently delete this Apache Spark™ cluster and all its associated data.

  1. From the Overview tab of your cluster, click the Settings tab, then select Delete cluster. Alternatively, click Actions in the top right corner, then select Delete cluster.

  2. Enter DELETE in the confirmation pop-up to confirm your action.

  3. Click Delete cluster.

SearchIcon
No Results