Managed Inference API

v1beta1

Introduction

Scaleway Managed Inference allows you to deploy and run machine learning models on Scaleway's infrastructure. This service provides scalable and efficient endpoints for your model inference needs. The Scaleway Inference API enables you to manage these endpoints and perform inference operations.

Tip

To retrieve information about the different models available for deployment on Scaleway Inference, check out our model documentation.

Concepts

Refer to our dedicated concepts page to find definitions of all concepts and terminology related to Managed Inference.

Quickstart

Configure your environment variables

Note
This is an optional step that seeks to simplify your usage of the Inference API. You can find your Project ID in the Scaleway console.
```
export SCW_SECRET_KEY="<API secret key>"
export SCW_DEFAULT_REGION="fr-par"
export SCW_PROJECT_ID="<Scaleway Project ID>"
```

Create a model deployment: Run the following command to create a deployment. Customize the details in the payload (name, model, description, tags, etc.) to your needs:

curl -X POST https://api.scaleway.com/inference/v1beta1/regions/$SCW_DEFAULT_REGION/deployments \
-H "Content-Type: application/json" \
-H "X-Auth-Token: $SCW_SECRET_KEY" \
-d '{
  "project_id": "'"$SCW_PROJECT_ID"'",
  "name": "my-inference-deployment",
  "model_name": "meta/llama-3-8b-instruct:bf16",
  "node_type": "L4",
  "min_size": 1,
  "max_size": 3,
  "accept_eula": true,
  "endpoints": [
    {
      "public": {}
    }
  ]
}'

Parameter	Description	Valid values
`project_id`	The Project in which the deployment should be created (string)	Any valid Scaleway Project ID, e.g., `"b4bd99e0-b389-11ed-afa1-0242ac120002"`
`name`	A name of your choice for the deployment (string)	Any string containing only alphanumeric characters, dots, spaces, and dashes, e.g., `"my-inference-deployment"`
`model_name`	The model to deploy (string)	Any valid model name in the format `vendor/model-name:version`, e.g., `"meta/llama-3-8b-instruct:bf16"`
`node_type`	The type of node to use for the deployment (string)	Example: `"L4"`
`min_size`	Minimum number of replicas for the deployment (integer)	Any integer, e.g., `1`
`max_size`	Maximum number of replicas for the deployment (integer)	Any integer, e.g., `3`
`accept_eula`	Indicates acceptance of the End User License Agreement (boolean)	`true`
`endpoints`	Defines the endpoints for the deployment (array)	At least one endpoint, e.g., `[ { "public": {} } ]`

Create a model endpoint: Run the following command to create an inference endpoint for the deployment. Customize the details in the payload to your needs:

Example for creating a public endpoint

curl -X POST https://api.scaleway.com/inference/v1beta1/regions/$SCW_DEFAULT_REGION/endpoints \
-H "Content-Type: application/json" \
-H "X-Auth-Token: $SCW_SECRET_KEY" \
-d '{
  "project_id": "'"$SCW_PROJECT_ID"'",
  "deployment_id": "your-deployment-id",
  "endpoint": {
    "disable_auth": false,
    "public": {}
  }
}'

Example for creating a private endpoint

curl -X POST https://api.scaleway.com/inference/v1beta1/regions/$SCW_DEFAULT_REGION/endpoints \
-H "Content-Type: application/json" \
-H "X-Auth-Token: $SCW_SECRET_KEY" \
-d '{
  "project_id": "'"$SCW_PROJECT_ID"'",
  "deployment_id": "your-deployment-id",
  "endpoint": {
    "disable_auth": false,
    "private_network": {
      "private_network_id": "your-private-network-id"
    }
  }
}'

Parameter	Description	Valid values
`project_id`	The Project in which the endpoint should be created (string)	Any valid Scaleway Project ID, e.g., `"b4bd99e0-b389-11ed-afa1-0242ac120002"`
`deployment_id`	The deployment ID to which the endpoint will be associated (string)	Any valid deployment ID, e.g. `"bcb0976d-98d6-49c1-b6b5-17804941c0b7"`
`disable_auth`	Specifies whether to disable authentication (boolean)	`true` or `false`
`public`	Public endpoint configuration (object)	`{}` for public endpoint
`private_network`	Private endpoint configuration including the private network ID (object)	`{ "private_network_id": "private-network-id" }`

List your deployments: Run the following command to get a list of all the deployments in your account, with their details:

curl -X GET \
  -H "Content-Type: application/json" \
  -H "X-Auth-Token: $SCW_SECRET_KEY" \
  "https://api.scaleway.com/inference/v1beta1/regions/$SCW_DEFAULT_REGION/deployments"

List your endpoints: Run the following command to get a list of all the inference endpoints in your account, with their details:

curl -X GET \
  -H "Content-Type: application/json" \
  -H "X-Auth-Token: $SCW_SECRET_KEY" \
  "https://api.scaleway.com/inference/v1beta1/regions/$SCW_DEFAULT_REGION/endpoints"

Delete an endpoint: Run the following command to delete an inference endpoint, specified by its endpoint ID:

curl -X DELETE \
  -H "X-Auth-Token: $SCW_SECRET_KEY" \
  -H "Content-Type: application/json" \
  "https://api.scaleway.com/inference/v1beta1/regions/$SCW_DEFAULT_REGION/endpoints/<endpoint-ID>"

The expected successful response is empty.

Important

Managed Inference deployments must have at least one endpoint, either public or private.

Requirements

You have a Scaleway account
You have created an API key and that the API key has sufficient IAM permissions to perform the actions described on this page
You have installed curl

Technical information

Region

Managed Inference endpoints are available in the following region:

Name	API ID
Paris	`fr-par`

Pagination

Most listing requests receive a paginated response. Requests against paginated endpoints accept two query arguments:

page, a positive integer to choose which page to return.
per_page, a positive integer lower or equal to 100 to select the number of items to return per page. The default value is 50.

Paginated endpoints usually also accept filters to search and sort results. These filters are documented along each endpoint documentation.

The X-Total-Count header contains the total number of items returned.

Creating a deployment: the model object

When creating a deployment, the model_id parameter is required. This specifies the model to deploy. Use the List Models endpoint to retrieve available model IDs.

Note

This information is designed to help you correctly configure the model_id parameter when using the Create a deployment method.

Going further

For more help using Scaleway Inference, check out the following resources:

Our main documentation
The #inference-beta channel on our Slack Community
Do not hesitate to reach out with any questions or issues you encounter during the public beta, our teams will help you through the dedicated Slack channel #inference-beta.

Models

A model represents a pre-trained machine learning model that can be deployed on the Managed Inference service.

They are used to define the inference model, its source, and its compatibility with the available nodes. Some models may be available in multiple quantization levels, which will affect the performance and the accuracy of the model.

GET/inference/v1beta1/regions/{region}/models

GET/inference/v1beta1/regions/{region}/models/{model_id}

Deployments

A deployment is a scalable pool of resources used to run inference models

GET/inference/v1beta1/regions/{region}/deployments

POST/inference/v1beta1/regions/{region}/deployments

GET/inference/v1beta1/regions/{region}/deployments/{deployment_id}

PATCH/inference/v1beta1/regions/{region}/deployments/{deployment_id}

DELETE/inference/v1beta1/regions/{region}/deployments/{deployment_id}

GET/inference/v1beta1/regions/{region}/deployments/{deployment_id}/certificate

Node types

Nodes are the compute units that make up your inference deployments

GET/inference/v1beta1/regions/{region}/node-types

Access Control List

Network Access Control Lists (ACLs) allow you to manage inbound network traffic by setting up ACL rules

DELETE/inference/v1beta1/regions/{region}/acls/{acl_id}

GET/inference/v1beta1/regions/{region}/deployments/{deployment_id}/acls

POST/inference/v1beta1/regions/{region}/deployments/{deployment_id}/acls

PUT/inference/v1beta1/regions/{region}/deployments/{deployment_id}/acls

Endpoints

An endpoint is the URL where the inference model can be accessed

Endpoints can be public or private, and can be protected by an IAM authentication token.

POST/inference/v1beta1/regions/{region}/endpoints

PATCH/inference/v1beta1/regions/{region}/endpoints/{endpoint_id}

DELETE/inference/v1beta1/regions/{region}/endpoints/{endpoint_id}