NavigationContentFooter

Managed Inference API

Introduction

Scaleway Managed Inference allows you to deploy and run machine learning models on Scaleway's infrastructure. This service provides scalable and efficient endpoints for your model inference needs. The Scaleway Inference API enables you to manage these endpoints and perform inference operations.

Tip

To retrieve information about the different models available for deployment on Scaleway Inference, check out our model documentation.

Concepts

Refer to our dedicated concepts page to find definitions of all concepts and terminology related to Managed Inference.

Quickstart

  1. Configure your environment variables

    Note

    This is an optional step that seeks to simplify your usage of the Inference API. You can find your Project ID in the Scaleway console.

    export SCW_SECRET_KEY="<API secret key>"
    export SCW_DEFAULT_REGION="fr-par"
    export SCW_PROJECT_ID="<Scaleway Project ID>"
  2. Create a model deployment: Run the following command to create a deployment. Customize the details in the payload (name, model, description, tags, etc.) to your needs:

    curl -X POST https://api.scaleway.com/inference/v1beta1/regions/$SCW_DEFAULT_REGION/deployments \
    -H "Content-Type: application/json" \
    -H "X-Auth-Token: $SCW_SECRET_KEY" \
    -d '{
    "project_id": "'"$SCW_PROJECT_ID"'",
    "name": "my-inference-deployment",
    "model_name": "meta/llama-3-8b-instruct:bf16",
    "node_type": "L4",
    "min_size": 1,
    "max_size": 3,
    "accept_eula": true,
    "endpoints": [
    {
    "public": {}
    }
    ]
    }'
    ParameterDescriptionValid values
    project_idThe Project in which the deployment should be created (string)Any valid Scaleway Project ID, e.g., "b4bd99e0-b389-11ed-afa1-0242ac120002"
    nameA name of your choice for the deployment (string)Any string containing only alphanumeric characters, dots, spaces, and dashes, e.g., "my-inference-deployment"
    model_nameThe model to deploy (string)Any valid model name in the format vendor/model-name:version, e.g., "meta/llama-3-8b-instruct:bf16"
    node_typeThe type of node to use for the deployment (string)Example: "L4"
    min_sizeMinimum number of replicas for the deployment (integer)Any integer, e.g., 1
    max_sizeMaximum number of replicas for the deployment (integer)Any integer, e.g., 3
    accept_eulaIndicates acceptance of the End User License Agreement (boolean)true
    endpointsDefines the endpoints for the deployment (array)At least one endpoint, e.g., [ { "public": {} } ]
  3. Create a model endpoint: Run the following command to create an inference endpoint for the deployment. Customize the details in the payload to your needs:

    Example for creating a public endpoint

    curl -X POST https://api.scaleway.com/inference/v1beta1/regions/$SCW_DEFAULT_REGION/endpoints \
    -H "Content-Type: application/json" \
    -H "X-Auth-Token: $SCW_SECRET_KEY" \
    -d '{
    "project_id": "'"$SCW_PROJECT_ID"'",
    "deployment_id": "your-deployment-id",
    "endpoint": {
    "disable_auth": false,
    "public": {}
    }
    }'

    Example for creating a private endpoint

    curl -X POST https://api.scaleway.com/inference/v1beta1/regions/$SCW_DEFAULT_REGION/endpoints \
    -H "Content-Type: application/json" \
    -H "X-Auth-Token: $SCW_SECRET_KEY" \
    -d '{
    "project_id": "'"$SCW_PROJECT_ID"'",
    "deployment_id": "your-deployment-id",
    "endpoint": {
    "disable_auth": false,
    "private_network": {
    "private_network_id": "your-private-network-id"
    }
    }
    }'
    ParameterDescriptionValid values
    project_idThe Project in which the endpoint should be created (string)Any valid Scaleway Project ID, e.g., "b4bd99e0-b389-11ed-afa1-0242ac120002"
    deployment_idThe deployment ID to which the endpoint will be associated (string)Any valid deployment ID, e.g. "bcb0976d-98d6-49c1-b6b5-17804941c0b7"
    disable_authSpecifies whether to disable authentication (boolean)true or false
    publicPublic endpoint configuration (object){} for public endpoint
    private_networkPrivate endpoint configuration including the private network ID (object){ "private_network_id": "private-network-id" }
  4. List your deployments: Run the following command to get a list of all the deployments in your account, with their details:

    curl -X GET \
    -H "Content-Type: application/json" \
    -H "X-Auth-Token: $SCW_SECRET_KEY" \
    "https://api.scaleway.com/inference/v1beta1/regions/$SCW_DEFAULT_REGION/deployments"
  5. List your endpoints: Run the following command to get a list of all the inference endpoints in your account, with their details:

    curl -X GET \
    -H "Content-Type: application/json" \
    -H "X-Auth-Token: $SCW_SECRET_KEY" \
    "https://api.scaleway.com/inference/v1beta1/regions/$SCW_DEFAULT_REGION/endpoints"
  6. Delete an endpoint: Run the following command to delete an inference endpoint, specified by its endpoint ID:

    curl -X DELETE \
    -H "X-Auth-Token: $SCW_SECRET_KEY" \
    -H "Content-Type: application/json" \
    "https://api.scaleway.com/inference/v1beta1/regions/$SCW_DEFAULT_REGION/endpoints/<endpoint-ID>"

    The expected successful response is empty.

    Important

    Managed Inference deployments must have at least one endpoint, either public or private.

Requirements
  • You have a Scaleway account
  • You have created an API key and that the API key has sufficient IAM permissions to perform the actions described on this page
  • You have installed curl

Technical information

Region

Managed Inference endpoints are available in the following region:

NameAPI ID
Parisfr-par

Pagination

Most listing requests receive a paginated response. Requests against paginated endpoints accept two query arguments:

  • page, a positive integer to choose which page to return.
  • per_page, a positive integer lower or equal to 100 to select the number of items to return per page. The default value is 50.

Paginated endpoints usually also accept filters to search and sort results. These filters are documented along each endpoint documentation.

The X-Total-Count header contains the total number of items returned.

Creating a deployment: the model object

When creating a deployment, the model_id parameter is required. This specifies the model to deploy. Use the List Models endpoint to retrieve available model IDs.

Note

This information is designed to help you correctly configure the model_id parameter when using the Create a deployment method.

Going further

For more help using Scaleway Inference, check out the following resources:

  • Our main documentation
  • The #inference-beta channel on our Slack Community
  • Do not hesitate to reach out with any questions or issues you encounter during the public beta, our teams will help you through the dedicated Slack channel #inference-beta.

Models

A model represents a pre-trained machine learning model that can be deployed on the Managed Inference service.

They are used to define the inference model, its source, and its compatibility with the available nodes. Some models may be available in multiple quantization levels, which will affect the performance and the accuracy of the model.

GET
/inference/v1beta1/regions/{region}/models
GET
/inference/v1beta1/regions/{region}/models/{model_id}

Deployments

A deployment is a scalable pool of resources used to run inference models

GET
/inference/v1beta1/regions/{region}/deployments
POST
/inference/v1beta1/regions/{region}/deployments
GET
/inference/v1beta1/regions/{region}/deployments/{deployment_id}
PATCH
/inference/v1beta1/regions/{region}/deployments/{deployment_id}
DELETE
/inference/v1beta1/regions/{region}/deployments/{deployment_id}
GET
/inference/v1beta1/regions/{region}/deployments/{deployment_id}/certificate

Node types

Nodes are the compute units that make up your inference deployments

GET
/inference/v1beta1/regions/{region}/node-types

Access Control List

Network Access Control Lists (ACLs) allow you to manage inbound network traffic by setting up ACL rules

DELETE
/inference/v1beta1/regions/{region}/acls/{acl_id}
GET
/inference/v1beta1/regions/{region}/deployments/{deployment_id}/acls
POST
/inference/v1beta1/regions/{region}/deployments/{deployment_id}/acls
PUT
/inference/v1beta1/regions/{region}/deployments/{deployment_id}/acls

Endpoints

An endpoint is the URL where the inference model can be accessed

Endpoints can be public or private, and can be protected by an IAM authentication token.

POST
/inference/v1beta1/regions/{region}/endpoints
PATCH
/inference/v1beta1/regions/{region}/endpoints/{endpoint_id}
DELETE
/inference/v1beta1/regions/{region}/endpoints/{endpoint_id}
API DocsScaleway consoleDedibox consoleScaleway LearningScaleway.comPricingBlogCareers
© 2023-2024 – Scaleway