Generative APIs
What are Scaleway Generative APIs?Link to this anchor
Scaleway’s Generative APIs provide access to pre-configured, serverless endpoints of leading AI models, hosted in European data centers. This allows you to integrate advanced AI capabilities into your applications without managing underlying infrastructure.
Which models are supported by Generative APIs?Link to this anchor
Our Generative APIs support a range of popular models, including:
- Chat / Text Generation models: Refer to our dedicated documentation for a list of supported chat models.
- Vision models: Refer to our dedicated documentation for a list of supported vision models.
- Embedding models: Refer to our dedicated documentation for a list of supported embedding models.
How does the free tier work?Link to this anchor
The free tier allows you to process up to 1,000,000 tokens without incurring any costs. After reaching this limit, you will be charged per million tokens processed. Free tier usage is calculated by adding all input and output tokens consumed from all models used. For more information, refer to our pricing page.
How can I monitor my token consumption?Link to this anchor
You can see your token consumption in Scaleway Cockpit. You can access it from the Scaleway console under the Metrics tab. Note that:
- Cockpits are isolated by Projects, hence you first need to select the right project in the Scaleway console before accessing Cockpit to see your token consumption for this Project (you can see the
project_id
in the Cockpit URL:https://{project_id}.dashboard.obs.fr-par.scw.cloud/
. - Cockpit graphs can take up to 1 hour to update token consumption, see Troubleshooting for further details.
How can I access and use the Generative APIs?Link to this anchor
Access is open to all Scaleway customers. You can start by using the Generative APIs Playground in the Scaleway console to experiment with different models. For integration into applications, you can use the OpenAI-compatible APIs provided by Scaleway. Detailed instructions are available in our Quickstart guide.
Where are the inference servers located?Link to this anchor
All models are currently hosted in a secure data center located in Paris, France, operated by OPCORE. This ensures low latency for European users and compliance with European data privacy regulations.
Where can I find the privacy policy regarding Generative APIs?Link to this anchor
You can find the privacy policy applicable to all use of Generative APIs here.
Can I use OpenAI libraries and APIs with Scaleway’s Generative APIs?Link to this anchor
Yes, Scaleway’s Generative APIs are designed to be compatible with OpenAI libraries and SDKs, including the OpenAI Python client library and LangChain SDKs. This allows for seamless integration with existing workflows.
What is the difference between Generative APIs and Managed Inference?Link to this anchor
- Generative APIs: A serverless service providing access to pre-configured AI models via API, billed per token usage.
- Managed Inference: Allows deployment of curated or custom models with chosen quantization and instances, offering predictable throughput and enhanced security features like private network isolation and access control. Managed Inference is billed by hourly usage, whether provisioned capacity is receiving traffic or not.
How do I get started with Generative APIs?Link to this anchor
To get started, explore the Generative APIs Playground in the Scaleway console. For application integration, refer to our Quickstart guide, which provides step-by-step instructions on accessing, configuring, and using a Generative APIs endpoint.
Are there any rate limits for API usage?Link to this anchor
Yes, API rate limits define the maximum number of requests a user can make within a specific time frame to ensure fair access and resource allocation between users. If you require increased rate limits (by a factor from 2 to 5 times), you can request them by creating a ticket. If you require even higher rate limits, especially to absorb infrequent peak loads, we recommend using Managed Inference instead with dedicated provisioned capacity. Refer to our dedicated documentation for more information on rate limits.
What is the model lifecycle for Generative APIs?Link to this anchor
Scaleway is dedicated to updating and offering the latest versions of generative AI models, ensuring improvements in capabilities, accuracy, and safety. As new versions of models are introduced, you can explore them through the Scaleway console. Learn more in our dedicated documentation.
What are the SLAs applicable to Generative APIs?Link to this anchor
We are currently working on defining our SLAs for Generative APIs. We will provide more information on this topic soon.
What are the performance guarantees (vs Managed Inference)?Link to this anchor
We are currently working on defining our performance guarantees for Generative APIs. We will provide more information on this topic soon.
Do model licenses apply when using Generative APIs?Link to this anchor
Yes, you need to comply with model licenses when using Generative APIs. Applicable licenses are available for each model in our documentation and in Console Playground.