NavigationContentFooter
Jump toSuggest an edit
Was this page helpful?

Rate limits

Reviewed on 09 December 2024Published on 27 August 2024

What are the limits?Link to this anchor

Any model served through Scaleway Generative APIs gets limited by:

  • Tokens per minute
  • Queries per minute

Base limits apply if you registered a valid payment method. This limits are increased automatically if you also validate your identity.

Tip

If you created a Scaleway Account but did not register a valid payment method, stricter limits apply to ensure usage stays within Free Tier only.

How can I increase the rate limits?Link to this anchor

We actively monitor usage and will improve rates based on feedback. If you need to increase your rate limits:

  • Verify your identity to automatically increase your rate limit as described below
  • Contact our support team, providing details on the model used and specific use case, for additional increase. Note that for increases of up to x5 or x10 volumes, we highly recommend using dedicated deployments with Managed Inference, which provides exactly the same features and API compatibility.

Chat modelsLink to this anchor

Model stringAdditional steps requiredRequests per minuteTotal tokens per minute
llama-3.1-8b-instructNone300200K
llama-3.1-70b-instructNone300200K
llama-3.3-70b-instructNone300200K
llama-3.3-70b-instructIdentity verified600400K
mistral-nemo-instruct-2407None300200K
pixtral-12b-2409None300200K
qwen2.5-32b-instructNone300200K

Embedding modelsLink to this anchor

Model stringRequests per minuteInput tokens per minute
bge-multilingual-gemma2300400K

Why do we set rate limits?Link to this anchor

These limits safeguard against abuse or misuse of Scaleway Generative APIs, helping to ensure fair access to the API with consistent performance.

Was this page helpful?
API DocsScaleway consoleDedibox consoleScaleway LearningScaleway.comPricingBlogCareers
© 2023-2025 – Scaleway