NavigationContentFooter
Jump toSuggest an edit

Rate limits

Reviewed on 30 October 2024Published on 27 August 2024

What are the limits?

Any model served through Scaleway Generative APIs gets limited by:

  • Tokens per minute
  • Queries per minute

Chat models

Model stringRequests per minuteTokens per minute
llama-3.1-8b-instruct300100K
llama-3.1-70b-instruct300100K
mistral-nemo-instruct-2407300100K
pixtral-12b-2409300100K

Embedding models

Model stringRequests per minuteTokens per minute
sentence-t5-xxl6001M
bge-multilingual-gemma26001M

Why do we set rate limits?

These limits safeguard against abuse or misuse of Scaleway Generative APIs, helping to ensure fair access to the API with consistent performance.

How can I increase the rate limits?

We actively monitor usage and will improve rates based on feedback. If you need to increase your rate limits, contact us via the support team, providing details on the model used and specific use case.

API DocsScaleway consoleDedibox consoleScaleway LearningScaleway.comPricingBlogCareers
© 2023-2024 – Scaleway