How to query language models

Reviewed on August 22, 2025

Scaleway's Generative APIs service allows users to interact with powerful language models hosted on the platform.

There are several ways to interact with language models:

The Scaleway console provides complete playground, aiming to test models, adapt parameters, and observe how these changes affect the output in real-time.
Via the Chat Completions API or the Responses API

Before you start

To complete the actions presented below, you must have:

A Scaleway account logged into the console
Owner status or IAM permissions allowing you to perform actions in the intended Organization
A valid API key for API authentication
Python 3.7+ installed on your system

Accessing the Playground

Scaleway provides a web playground for instruct-based models hosted on Generative APIs.

Navigate to Generative APIs under the AI section of the Scaleway console side menu. The list of models you can query displays.
Click the name of the chat model you want to try. Alternatively, click more icon next to the chat model, and click Try model in the menu.

The web playground displays.

Using the playground

Enter a prompt at the bottom of the page, or use one of the suggested prompts in the conversation area.
Edit the hyperparameters listed on the right column, for example the default temperature for more or less randomness on the outputs.
Switch models at the top of the page, to observe the capabilities of chat models offered via Generative APIs.
Click View code to get code snippets configured according to your settings in the playground.

Querying language models via API

You can query the models programmatically using your favorite tools or languages. In the example that follows, we will use the OpenAI Python client.

Chat Completions API or Responses API?

Both the Chat Completions API and the Responses API are OpenAI-compatible REST APIs that can be used for generating and manipulating conversations. The Chat Completions API is focused on generating conversational responses, while the Responses API is a more general REST API for chat, structured outputs, tool use, and multimodal inputs.

The Chat Completions API was released in 2023, and is an industry standard for building AI applications, being specifically designed for handling multi-turn conversations. It is stateless, but allows users to manage conversation history by appending each new message to the ongoing conversation. Messages in the conversation can include text, images and audio extracts. The API supports function tool-calling, allowing developers to define functions that the model can choose to call. If it does so, it returns the function name and arguments, which the developer's code must execute and feed back into the conversation.

The Responses API was released in 2025, and is designed to combine the simplicity of Chat Completions with the ability to do more agentic tasks and reasoning. It supports statefulness, being able to maintain context without needing to resend the entire conversation history. It offers tool-calling by built-in tools (e.g. web or file search) that the model is able to execute itself while generating a response.

Note

Scaleway's support for the Responses API is currently at beta stage. Support of the full feature set will be incremental: currently statefulness and tools other than function calling are not supported.

Most supported Generative API models can be used with both Chat Completions and Responses API. For the gpt-oss-120b model, use of the Responses API is recommended, as it will allow you to access all of its features, especially tool-calling.

For full details on the differences between these APIs, see the official OpenAI documentation.

Installing the OpenAI SDK

Install the OpenAI SDK using pip:

pip install openai

Initializing the client

Initialize the OpenAI client with your base URL and API key:

from openai import OpenAI

# Initialize the client with your base URL and API key
client = OpenAI(
    base_url="https://api.scaleway.ai/v1",  # Scaleway's Generative APIs service URL
    api_key="<SCW_SECRET_KEY>"  # Your unique API secret key from Scaleway
)

Generating a chat completion

You can now create a chat completion using either the Chat Completions or Responses API, as shown in the following examples:

A conversation style may include a default system prompt. You may set this prompt by setting the first message with the role system. For example:

[
  {
    "role": "system",
    "content": "You are Xavier Niel."
  },
  {
    "role": "user",
    "content": "Hello, what is your name?"
  }
]

Adding such a system prompt can also help resolve issues if you receive responses such as I'm not sure what tools are available to me. Can you please provide a library of tools that I can use to generate a response?.

Model parameters and their effects

The following parameters will influence the output of the model:

If you encounter an error such as "Forbidden 403" refer to the API documentation for troubleshooting tips.

Streaming

By default, the outputs are returned to the client only after the generation process is complete. However, a common alternative is to stream the results back to the client as they are generated. This is particularly useful in chat applications, where it allows the client to view the results incrementally as each token is produced.

Following is an example using the Chat Completions API, but the stream parameter can be set in the same way with the Responses API.

from openai import OpenAI

client = OpenAI(
    base_url="https://api.scaleway.ai/v1",  # Scaleway's Generative APIs service URL
    api_key="<SCW_API_KEY>"  # Your unique API key from Scaleway
)
response = client.chat.completions.create(
  model="llama-3.1-8b-instruct",
  messages=[{
    "role": "user",
    "content": "Sing me a song",
  }],
  stream=True,
)

for chunk in response:
    if chunk.choices and chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

Async

The service also supports asynchronous mode for any chat completion.