NavigationContentFooter
Jump toSuggest an edit

Fixing common issues with Generative APIs

Reviewed on 16 January 2025Published on 16 January 2025

Below are common issues that you may encounter when using Generative APIs, their causes, and recommended solutions.

429: Too Many Requests - You exceeded your current quota of requests/tokens per minute

Cause

  • You performed too many API requests over a given minute
  • You consumed too many tokens (input and output) with your API requests over a given minute

Solution

  • Ask our support to raise your quota
  • Smooth out your API requests rate by limiting the number of API requests you perform in parallel
  • Reduce the size of the input or output tokens processed by your API requests
  • Use Managed Inference, where these quota do not apply (your throughput will be only limited by the amount of Inference Deployment your provision)

504: Gateway Timeout

Cause

  • The query is too long to process (even if context-length stays between supported context window and maximum tokens)
  • The model goes into an infinite loop while processing the input (which is a known structural issue with several AI models)

Solution

  • Set a stricter maximum token limit to prevent overly long responses.
  • Reduce the size of the input tokens, or split the input into multiple API requests.
  • Use Managed Inference, where no query timeout is enforced.

Structured output (e.g., JSON) is not working correctly

Cause

  • Incorrect field naming in the request, such as using "format" instead of the correct "response_format" field.
  • Lack of a JSON schema, which can lead to ambiguity in the output structure.

Solution

  • Ensure the proper field "response_format" is used in the query.
  • Provide a JSON schema in the request to guide the model’s structured output.
  • Refer to the documentation on structured outputs for examples and additional guidance.

Multiple “role”: “user” successive messages

Cause

  • Successive messages with "role": "user" are sent in the API request instead of alternating between "role": "user" and "role": "assistant".

Solution

  • Ensure the "messages" array alternates between "role": "user" and "role": "assistant".
  • If multiple "role": "user" messages need to be sent, concatenate them into one "role": "user" message or intersperse them with appropriate "role": "assistant" responses.

Example error message (for Mistral models)

{
"object": "error",
"message": "After the optional system message, conversation roles must alternate user/assistant/user/assistant/...",
"type": "BadRequestError",
"param": null,
"code": 400
}

Best practices for optimizing model performance

Input size management

  • Avoid overly long input sequences; break them into smaller chunks if needed.
  • Use summarization techniques for large inputs to reduce token count while maintaining relevance.

Use proper parameter configuration

  • Double-check parameters like "temperature", "max_tokens", and "top_p" to ensure they align with your use case.
  • For structured output, always include a "response_format" and, if possible, a detailed JSON schema.

Debugging silent errors

  • For cases where no explicit error is returned:
    • Verify all fields in the API request are correctly named and formatted.
    • Test the request with smaller and simpler inputs to isolate potential issues.
Was this page helpful?
API DocsScaleway consoleDedibox consoleScaleway LearningScaleway.comPricingBlogCareers
© 2023-2025 – Scaleway