NavigationContentFooter
Suggest an edit

How to create a Distributed Data Lab

Reviewed on 31 July 2024Published on 31 July 2024

Distributed Data Lab is a product designed to assist data scientists and data engineers in performing calculations on a remotely managed Apache Spark infrastructure.

Before you start

To complete the actions presented below, you must have:

  • A Scaleway account logged into the console
  • Owner status or IAM permissions allowing you to perform actions in the intended Organization
  • Signed up to the private beta and received a confirmation email.
  • Optionally, an Object Storage bucket
  • A valid API key
  1. Click Data Lab under Managed Services on the side menu. The Distributed Data Lab page displays.

  2. Click Create Data Lab cluster. The creation wizard displays.

  3. Complete the following steps in the wizard:

    • Choose an Apache Spark version from the drop-down menu.
    • Select a worker node configuration.
    • Enter the desired number of worker nodes.
      Note

      Provisioning zero worker nodes lets you retain and access you cluster and notebook configurations, but will not allow you to run calculations.

    • Optionally, choose an Object Storage bucket in the desired region to store the data source and results.
    • Enter a name for your Data Lab.
    • Optionally, add a description and/or tags for your Data Lab.
    • Verify the estimated cost.
  4. Click Create Data Lab cluster to finish. You are directed to the Data Lab cluster overview page.

See also
How to connect to a Data Lab
API DocsScaleway consoleDedibox consoleScaleway LearningScaleway.comPricingBlogCareers
© 2023-2024 – Scaleway