Provisioning zero worker nodes lets you retain and access you cluster and notebook configurations, but will not allow you to run calculations.
How to create a Distributed Data Lab
Reviewed on 31 July 2024 • Published on 31 July 2024
Distributed Data Lab is a product designed to assist data scientists and data engineers in performing calculations on a remotely managed Apache Spark infrastructure.
Before you startLink to this anchor
To complete the actions presented below, you must have:
- A Scaleway account logged into the console
- Owner status or IAM permissions allowing you to perform actions in the intended Organization
- Signed up to the private beta and received a confirmation email.
- Optionally, an Object Storage bucket
- A valid API key
-
Click Data Lab under Managed Services on the side menu. The Distributed Data Lab page displays.
-
Click Create Data Lab cluster. The creation wizard displays.
-
Complete the following steps in the wizard:
- Choose an Apache Spark version from the drop-down menu.
- Select a worker node configuration.
- Enter the desired number of worker nodes.
Note
- Optionally, choose an Object Storage bucket in the desired region to store the data source and results.
- Enter a name for your Data Lab.
- Optionally, add a description and/or tags for your Data Lab.
- Verify the estimated cost.
-
Click Create Data Lab cluster to finish. You are directed to the Data Lab cluster overview page.
Was this page helpful?