Clusters for Apache Spark™ - Quickstart
Console overview
Discover the Clusters for Apache Spark™ interface on the Scaleway console (Clusters for Apache Spark™ was formerly known as Data Lab for Apache Spark™).
Clusters for Apache Spark™ is a product designed to assist data engineers and data scientists in performing data processing on a remotely managed Apache Spark™ infrastructure.
This documentation explains how to quickly create an Apache Spark™ cluster, access its notebook environment, run the included demo file, and delete your cluster.
Before you start
To complete the actions presented below, you must have:
- A Scaleway account logged into the console
- Owner status or IAM permissions allowing you to perform actions in the intended Organization
- Created a Private Network
- Created an IAM API key
How to create an Apache Spark™ cluster
-
Under Data & Analytics on the side menu, click Apache Spark™.
-
Click Create cluster. The creation wizard displays.
-
Complete the following steps in the wizard:
- Select a region for your cluster.
- Choose an Apache Spark™ version from the drop-down menu.
- Select a main node type.
- Select a CPU worker node configuration.
- Enter the desired number of worker nodes.
- Add an optional notebook (JupyterLab).
- Select an existing Private Network, or create a new one.
- Enter a name and optional tags for your cluster.
- Verify the estimated cost.
-
Click Create Cluster to finish.
Once the cluster is created, you are directed to its Overview page.
Refer to the dedicated documentation for detailed information on how to create a cluster.
How to connect to your cluster's notebook
-
Click Apache Spark™ under Data & Analytics on the side menu. The Clusters for Apache Spark™ page displays.
-
Click the name of the cluster you want to connect to. The cluster Overview page displays.
-
In the Notebook section, click Open notebook. You are directed to the notebook login page.
-
Enter your API secret key when prompted for a password, then click Log in.
You are directed to the notebook home screen.
How to run the demo file
Each Cluster for Apache Spark™ comes with a default DatalabDemo.ipynb demonstration file for testing purposes. This file contains a preconfigured notebook environment that requires no modification to run.
Execute the cells in order to perform predetermined operations on a dummy data set representative of real life use cases and workloads to assess the performance of your cluster.
How to delete an Apache Spark cluster™
-
From the Overview tab of your cluster, click the Settings tab, then select Delete cluster. Alternatively, click Actions in the top right corner, then select Delete cluster.
-
Enter DELETE in the confirmation pop-up to confirm your action.
-
Click Delete cluster.