Data Lab for Apache Spark™
Speed up data processing over very large volumes of data with an Apache Spark™ managed solution.

Big Data is slowing you down
Datasets are getting bigger, but slower to process
Existing infrastructures aren't designed to process large volumes of data, impacting operational efficiency.
Taking time away from your data teams
Managing the infrastructure gets increasingly complex and time-consuming, with high dependency on engineering teams.
Leaving them little time to derive insights
Accessing and analyzing data becomes cumbersome with ever-growing datasets.
Get the most out of your data
Reduce time-to-insights and accelerate decision-making by empowering data scientists, data engineers and data analysts to maintain reliable data pipelines without extensive monitoring and manual intervention - all thanks to Scaleway's fully managed Apache Spark™ solution.
Accelerate time-to-insights with high-speed processing
Process and analyze large datasets quickly, reducing time-to-insights and enhancing decision-making.
Lower your total cost of ownership
Reduce the operational burden on your teams and the related costs with a fully managed Apache Spark™ solution designed to simplify big data management.
Develop ML projects swiftly and drive value
Query your data quickly by using the combined power of our Data Lab and GPU, and stay on top of your AI ambitions.
Use cases
Advanced analytics
Explore and process large datasets autonomously, unlocking deeper insights with minimal effort. The intuitive JupyterLab environment allows for enhanced collaboration, code execution, and data visualization, all within a single workspace.
Machine Learning
Accelerate the pre-processing of your models and start training without the hassle of infrastructure management. Powered by Apache Spark™ and supporting Python, the Data Lab offers GPU capabilities and fast training in an intuitive JupyterLab environment, tailored answer most complex ML needs.
Key features and capabilities
JupyterLab with MLib
Use the popular MLlib library, which provides tools for classification, regression, clustering, and more.
User-friendly interface
Access an intuitive and straightforward platform for maximized productivity.
Apache Spark™ cluster
Create and deploy Apache Spark™ clusters fully compatible with Amazon S3 data storage and JupyterLab notebook.
Clear and transparent pricing
Includes architecture, cluster, and attached volumes in a single package.
Apache Spark™ cluster powered by GPU
Benefit from GPU capabilities thanks to Nvidia RAPIDS framework enabled clusters.
Why Scaleway?
24/7 support
Our technical assistance is available 24/7 to answer all your questions and assist you.
Enriched experience
We offer a new experience with API access, Linux distributions, an intuitive console, and Terraform.
Easy-to-use console
Our user interface was created with developers in mind. To give you the best & fun experience managing your cloud projects.
True cloud ecosystem
Our cloud products are designed & built to work together, offering you a seamless, world-class cloud experience.
Frequently asked questions
What is Data Lab for Apache Spark™?
Data Lab for Apache Spark™ is a solution designed to for data scientists and engineers to process large datasets using a fully managed Apache Spark™ cluster. It includes:
- JupyterLab notebook connected to Apache Spark™
- Native integration with Object Storage for seamless data access
- CPU and GPU worker nodes
Users can quickly provision Apache Spark™ clusters to perform complex analytics, machine learning tasks, or basic operations on large datasets - with results saved directly into their Amazon S3 buckets.
What is a managed Apache Spark™ cluster?
Scaleway takes care of installation, configuration, and maintenance to ensure optimal performance. This includes providing all the necessary computing power, allowing your team to focus solely on extracting value from your data without worrying about infrastructure complexities.
What workloads is Data Lab for Apache Spark™ suited for?
Data Lab for Apache Spark™ supports a wide range of workloads, including:
- Complex analytics
- Machine learning tasks
- High-speed operations on large datasets
It offers scalable CPU and GPU instances with flexible node limits, and robust Apache Spark™ library support.
How can I access this service?
Data Lab for Apache Spark™ is currently available in public beta via the Scaleway Console or through the Scaleway API.
Is Data Lab for Apache Spark™ connected to other Scaleway products?
Yes, it integrates with:
- Object Storage (compatible with Amazon S3): pre-configured connection, only authorization is needed
- Cockpit (coming January 2025): monitor usage and logs