To retrieve information about the different models available for deployment on Scaleway Inference, check out our model documentation.
Managed Inference API
Introduction
Scaleway Managed Inference allows you to deploy and run machine learning models on Scaleway's infrastructure. This service provides scalable and efficient endpoints for your model inference needs. The Scaleway Inference API enables you to manage these endpoints and perform inference operations.
Concepts
Refer to our dedicated concepts page to find definitions of all concepts and terminology related to Managed Inference.
Quickstart
-
Configure your environment variables
NoteThis is an optional step that seeks to simplify your usage of the Inference API. You can find your Project ID in the Scaleway console.
export SCW_SECRET_KEY="<API secret key>"export SCW_DEFAULT_REGION="fr-par"export SCW_PROJECT_ID="<Scaleway Project ID>" -
Create a model deployment: Run the following command to create a deployment. Customize the details in the payload (name, model, description, tags, etc.) to your needs:
curl -X POST https://api.scaleway.com/inference/v1beta1/regions/$SCW_DEFAULT_REGION/deployments \-H "Content-Type: application/json" \-H "X-Auth-Token: $SCW_SECRET_KEY" \-d '{"project_id": "'"$SCW_PROJECT_ID"'","name": "my-inference-deployment","model_name": "meta/llama-3-8b-instruct:bf16","node_type": "L4","min_size": 1,"max_size": 3,"accept_eula": true,"endpoints": [{"public": {}}]}'Parameter Description Valid values project_id
The Project in which the deployment should be created (string) Any valid Scaleway Project ID, e.g., "b4bd99e0-b389-11ed-afa1-0242ac120002"
name
A name of your choice for the deployment (string) Any string containing only alphanumeric characters, dots, spaces, and dashes, e.g., "my-inference-deployment"
model_name
The model to deploy (string) Any valid model name in the format vendor/model-name:version
, e.g.,"meta/llama-3-8b-instruct:bf16"
node_type
The type of node to use for the deployment (string) Example: "L4"
min_size
Minimum number of replicas for the deployment (integer) Any integer, e.g., 1
max_size
Maximum number of replicas for the deployment (integer) Any integer, e.g., 3
accept_eula
Indicates acceptance of the End User License Agreement (boolean) true
endpoints
Defines the endpoints for the deployment (array) At least one endpoint, e.g., [ { "public": {} } ]
-
Create a model endpoint: Run the following command to create an inference endpoint for the deployment. Customize the details in the payload to your needs:
Example for creating a public endpoint
curl -X POST https://api.scaleway.com/inference/v1beta1/regions/$SCW_DEFAULT_REGION/endpoints \-H "Content-Type: application/json" \-H "X-Auth-Token: $SCW_SECRET_KEY" \-d '{"project_id": "'"$SCW_PROJECT_ID"'","deployment_id": "your-deployment-id","endpoint": {"disable_auth": false,"public": {}}}'Example for creating a private endpoint
curl -X POST https://api.scaleway.com/inference/v1beta1/regions/$SCW_DEFAULT_REGION/endpoints \-H "Content-Type: application/json" \-H "X-Auth-Token: $SCW_SECRET_KEY" \-d '{"project_id": "'"$SCW_PROJECT_ID"'","deployment_id": "your-deployment-id","endpoint": {"disable_auth": false,"private_network": {"private_network_id": "your-private-network-id"}}}'Parameter Description Valid values project_id
The Project in which the endpoint should be created (string) Any valid Scaleway Project ID, e.g., "b4bd99e0-b389-11ed-afa1-0242ac120002"
deployment_id
The deployment ID to which the endpoint will be associated (string) Any valid deployment ID, e.g. "bcb0976d-98d6-49c1-b6b5-17804941c0b7"
disable_auth
Specifies whether to disable authentication (boolean) true
orfalse
public
Public endpoint configuration (object) {}
for public endpointprivate_network
Private endpoint configuration including the private network ID (object) { "private_network_id": "private-network-id" }
-
List your deployments: Run the following command to get a list of all the deployments in your account, with their details:
curl -X GET \-H "Content-Type: application/json" \-H "X-Auth-Token: $SCW_SECRET_KEY" \"https://api.scaleway.com/inference/v1beta1/regions/$SCW_DEFAULT_REGION/deployments" -
List your endpoints: Run the following command to get a list of all the inference endpoints in your account, with their details:
curl -X GET \-H "Content-Type: application/json" \-H "X-Auth-Token: $SCW_SECRET_KEY" \"https://api.scaleway.com/inference/v1beta1/regions/$SCW_DEFAULT_REGION/endpoints" -
Delete an endpoint: Run the following command to delete an inference endpoint, specified by its endpoint ID:
curl -X DELETE \-H "X-Auth-Token: $SCW_SECRET_KEY" \-H "Content-Type: application/json" \"https://api.scaleway.com/inference/v1beta1/regions/$SCW_DEFAULT_REGION/endpoints/<endpoint-ID>"The expected successful response is empty.
ImportantManaged Inference deployments must have at least one endpoint, either public or private.
- You have a Scaleway account
- You have created an API key and that the API key has sufficient IAM permissions to perform the actions described on this page
- You have installed
curl
Technical information
Region
Managed Inference endpoints are available in the following region:
Name | API ID |
---|---|
Paris | fr-par |
Pagination
Most listing requests receive a paginated response. Requests against paginated endpoints accept two query
arguments:
page
, a positive integer to choose which page to return.per_page
, a positive integer lower or equal to 100 to select the number of items to return per page. The default value is50
.
Paginated endpoints usually also accept filters to search and sort results. These filters are documented along each endpoint documentation.
The X-Total-Count
header contains the total number of items returned.
Creating a deployment: the model object
When creating a deployment, the model_id
parameter is required. This specifies the model to deploy. Use the List Models endpoint to retrieve available model IDs.
This information is designed to help you correctly configure the model_id
parameter when using the Create a deployment method.
Going further
For more help using Scaleway Inference, check out the following resources:
- Our main documentation
- The #inference-beta channel on our Slack Community
- Do not hesitate to reach out with any questions or issues you encounter during the public beta, our teams will help you through the dedicated Slack channel #inference-beta.
Models
A model represents a pre-trained machine learning model that can be deployed on the Managed Inference service.
They are used to define the inference model, its source, and its compatibility with the available nodes. Some models may be available in multiple quantization levels, which will affect the performance and the accuracy of the model.
GET
/inference/v1beta1/regions/{region}/models
GET
/inference/v1beta1/regions/{region}/models/{model_id}
Deployments
A deployment is a scalable pool of resources used to run inference models
GET
/inference/v1beta1/regions/{region}/deployments
POST
/inference/v1beta1/regions/{region}/deployments
GET
/inference/v1beta1/regions/{region}/deployments/{deployment_id}
PATCH
/inference/v1beta1/regions/{region}/deployments/{deployment_id}
DELETE
/inference/v1beta1/regions/{region}/deployments/{deployment_id}
GET
/inference/v1beta1/regions/{region}/deployments/{deployment_id}/certificate
Node types
Nodes are the compute units that make up your inference deployments
GET
/inference/v1beta1/regions/{region}/node-types
Access Control List
Network Access Control Lists (ACLs) allow you to manage inbound network traffic by setting up ACL rules
DELETE
/inference/v1beta1/regions/{region}/acls/{acl_id}
GET
/inference/v1beta1/regions/{region}/deployments/{deployment_id}/acls
POST
/inference/v1beta1/regions/{region}/deployments/{deployment_id}/acls
PUT
/inference/v1beta1/regions/{region}/deployments/{deployment_id}/acls
Endpoints
An endpoint is the URL where the inference model can be accessed
Endpoints can be public or private, and can be protected by an IAM authentication token.
POST
/inference/v1beta1/regions/{region}/endpoints
PATCH
/inference/v1beta1/regions/{region}/endpoints/{endpoint_id}
DELETE
/inference/v1beta1/regions/{region}/endpoints/{endpoint_id}