NavigationContentFooter
Jump toSuggest an edit

LLM Inference - Quickstart

Scaleway LLM Inference is the first European LLM Inference platform on the market. It is a scalable and secure inference engine for Large Language Models (LLMs).

Scaleway LLM Inference is a fully managed service that allows you to serve generative AI models in a production environment. With Scaleway LLM Inference, you can easily deploy, manage, and scale LLMs without worrying about the underlying infrastructure.

Here are some of the key features of Scaleway LLM Inference:

  • Easy deployment: Deploy state-of-the-art open weights LLMs with just a few clicks. Scaleway LLM Inference provides a simple and intuitive interface for generating dedicated endpoints.
  • Security: Scaleway provides a secure environment for running your models. Our platform is built on top of a secure architecture, and we use state-of-the-art cloud security.
  • Complete data privacy: No storage or third-party access to your data (prompt or responses), to ensure it remains exclusively yours.
  • Auto-scaling (coming soon): Scaleway LLM Inference automatically scales your instances based on demand, ensuring that your models are always available and responsive.
Important

This service is in private beta. Specific terms and conditions apply.

How to create an LLM Inference deployment

  1. Navigate to the AI & Data section of the Scaleway console, and select LLM Inference from the side menu to access the LLM Inference dashboard.
  2. Click Create deployment to launch the deployment creation wizard.
  3. Provide the necessary information:
    • Select the desired model and the quantization to use for your deployment from the available options
      Note

      Some models may require acceptance of an end-user license agreement. If prompted, review the terms and conditions and accept the license accordingly.

    • Choose the geographical region for the deployment.
    • Specify the GPU Instance type to be used with your deployment.
  4. Enter a name for the deployment, along with optional tags to aid in organization.
  5. Configure the network settings for the deployment:
    • Enable Private Network for secure communication and restricted availability within Private Networks. Choose an existing Private Network from the drop-down list, if applicable.
    • Enable Public Network to access resources via the public Internet. Token protection is enabled by default.
    Important
    • It is not possible to change network settings through the Scaleway console after the deployment creation.
    • Enabling both private and public networks will result in two distinct endpoints (public and private) for your deployment.
    • Deployments must have at least one endpoint, either public or private.
  6. Click Create deployment to launch the deployment process. Once the deployment is ready, it will be listed among your deployments.

How to access an LLM Inference deployment

LLM Inference deployments use dynamic tokens generated with Scaleway’s Identity and Access Management service (IAM) for authentication.

  1. Click LLM Inference in the AI & Data section of the side menu. The LLM Inference dashboard displays.
  2. Click «See more Icon» next to the deployment you want to edit. The deployment dashboard displays.
  3. Click Create token in the Deployment connection section of the dashboard. The token creation wizard displays.
  4. Fill in the required information for token creation and click Generate API key.
Tip

You have full control over authentication from the Security tab of your deployment. Authentication is enabled by default.

How to interact with an LLM

  1. Click LLM Inference in the AI & Data section of the side menu. The LLM Inference dashboard displays.
  2. Click «See more Icon» next to the deployment you want to edit. The deployment dashboard displays.
  3. Click the Inference tab. Code examples in various environments display. Copy and paste them in your code or terminal.
Note

Prompt structure may vary from one model to another. Refer to the specific instructions for use in our dedicated documentation

How to delete a deployment

  1. Click LLM Inference in the AI & Data section of the Scaleway console side menu. A list of your deployments displays.
  2. Choose a deployment either by clicking its name or selecting More info from the drop-down menu represented by the icon «See more Icon» to access the deployment dashboard.
  3. Click the Settings tab of your deployment to display additional settings.
  4. Click Delete deployment.
  5. Type DELETE to confirm and click Delete deployment to delete your deployment.
Important

Deleting a deployment is a permanent action, and will erase all its associated configuration and resources.

Docs APIScaleway consoleDedibox consoleScaleway LearningScaleway.comPricingBlogCarreer
© 2023-2024 – Scaleway