NavigationContentFooter
Jump toSuggest an edit
Was this page helpful?

Supported models in Managed Inference

Reviewed on 08 April 2025Published on 08 April 2025

Scaleway Managed Inference allows you to deploy various AI models, either from:

  • Scaleway catalog: A curated set of ready-to-deploy models available through the Scaleway console or the Managed Inference models API
  • Custom models: Models that you import, typically from sources like Hugging Face.

Scaleway catalogLink to this anchor

Multimodal models (chat + vision)Link to this anchor

More details to be added.

Chat modelsLink to this anchor

ProviderModel identifierDocumentationLicense
Allen AImolmo-72b-0924View DetailsApache 2.0 license
Deepseekdeepseek-r1-distill-llama-70bView DetailsMIT license
Deepseekdeepseek-r1-distill-llama-8bView DetailsMIT license
Metallama-3-70b-instructView DetailsLlama 3 license
Metallama-3-8b-instructView DetailsLlama 3 license
Metallama-3.1-70b-instructView DetailsLlama 3.1 community license
Metallama-3.1-8b-instructView DetailsLlama 3.1 license
Metallama-3.3-70b-instructView DetailsLlama 3.3 license
Nvidiallama-3.1-nemotron-70b-instructView DetailsLlama 3.1 community license
Mistralmixtral-8x7b-instruct-v0.1View DetailsApache 2.0 license
Mistralmistral-7b-instruct-v0.3View DetailsApache 2.0 license
Mistralmistral-nemo-instruct-2407View DetailsApache 2.0 license
Mistralmistral-small-24b-instruct-2501View DetailsApache 2.0 license
Mistralpixtral-12b-2409View DetailsApache 2.0 license
Qwenqwen2.5-coder-32b-instructView DetailsApache 2.0 license

Vision modelsLink to this anchor

More details to be added.

Embedding modelsLink to this anchor

ProviderModel identifierDocumentationLicense
BAAIbge-multilingual-gemma2View DetailsGemma Terms of Use
Sentence Transformerssentence-t5-xxlView DetailsApache 2.0 license

Custom modelsLink to this anchor

Note

Custom model support is currently in beta. If you encounter issues or limitations, please report them via our Slack community channel or customer support.

PrerequisitesLink to this anchor

Tip

We recommend starting with a variation of a supported model from the Scaleway catalog. For example, you can deploy a quantized (4-bit) version of Llama 3.3. If deploying a fine-tuned version of Llama 3.3, make sure your file structure matches the example linked above.

To deploy a custom model via Hugging Face, ensure the following:

Access requirements

  • You must have access to the model using your Hugging Face credentials.
  • For gated models, request access through your Hugging Face account.
  • Credentials are not stored, but we recommend using read or fine-grained access tokens.

Required files

Your model repository must include:

  • A config.json file containig:
    • An architectures array (see supported architectures for the exact list of supported values).
    • max_position_embeddings
  • Model weights in the .safetensors format
  • A chat template included in either:
    • tokenizer_config.json as a chat_template field, or
    • chat_template.json as a chat_template field

Supported model types

Your model must be one of the following types:

  • chat
  • vision
  • multimodal (chat + vision)
  • embedding
Important

Security Notice
Models using formats that allow arbitrary code execution, such as Python pickle, are not supported.

API supportLink to this anchor

Depending on the model type, specific endpoints and features will be supported.

Chat modelsLink to this anchor

The Chat API will be exposed for this model under /v1/chat/completions endpoint. Structured outputs or Function calling are not yet supported for custom models.

Vision modelsLink to this anchor

Chat API will be exposed for this model under /v1/chat/completions endpoint. Structured outputs or Function calling are not yet supported for custom models.

Multimodal modelsLink to this anchor

These models will be treated similarly to both Chat and Vision models.

Embedding modelsLink to this anchor

Embeddings API will be exposed for this model under /v1/embeddings endpoint.

Custom model lifecycleLink to this anchor

Currently, custom model deployments are considered to be valid for the long term, and we will ensure any updates or changes to Managed Inference will not impact existing deployments. In case of breaking changes, leading to some custom models not being supported anymore, we will notify you at least 3 months beforehand.

LicensingLink to this anchor

When deploying custom models, you remain responsible for complying with any License requirements from the model provider, as you would do by running the model on a custom provisioned GPU.

Supported model architecturesLink to this anchor

Custom models must conform to one of the architectures listed below. Click to expand full list.

Supported custom model architectures

Custom model deployment currently supports the following model architectures:

  • AquilaModel
  • AquilaForCausalLM
  • ArcticForCausalLM
  • BaiChuanForCausalLM
  • BaichuanForCausalLM
  • BloomForCausalLM
  • CohereForCausalLM
  • Cohere2ForCausalLM
  • DbrxForCausalLM
  • DeciLMForCausalLM
  • DeepseekForCausalLM
  • DeepseekV2ForCausalLM
  • DeepseekV3ForCausalLM
  • ExaoneForCausalLM
  • FalconForCausalLM
  • Fairseq2LlamaForCausalLM
  • GemmaForCausalLM
  • Gemma2ForCausalLM
  • GlmForCausalLM
  • GPT2LMHeadModel
  • GPTBigCodeForCausalLM
  • GPTJForCausalLM
  • GPTNeoXForCausalLM
  • GraniteForCausalLM
  • GraniteMoeForCausalLM
  • GritLM
  • InternLMForCausalLM
  • InternLM2ForCausalLM
  • InternLM2VEForCausalLM
  • InternLM3ForCausalLM
  • JAISLMHeadModel
  • JambaForCausalLM
  • LlamaForCausalLM
  • LLaMAForCausalLM
  • MambaForCausalLM
  • FalconMambaForCausalLM
  • MiniCPMForCausalLM
  • MiniCPM3ForCausalLM
  • MistralForCausalLM
  • MixtralForCausalLM
  • QuantMixtralForCausalLM
  • MptForCausalLM
  • MPTForCausalLM
  • NemotronForCausalLM
  • OlmoForCausalLM
  • Olmo2ForCausalLM
  • OlmoeForCausalLM
  • OPTForCausalLM
  • OrionForCausalLM
  • PersimmonForCausalLM
  • PhiForCausalLM
  • Phi3ForCausalLM
  • Phi3SmallForCausalLM
  • PhiMoEForCausalLM
  • Qwen2ForCausalLM
  • Qwen2MoeForCausalLM
  • RWForCausalLM
  • StableLMEpochForCausalLM
  • StableLmForCausalLM
  • Starcoder2ForCausalLM
  • SolarForCausalLM
  • TeleChat2ForCausalLM
  • XverseForCausalLM
  • BartModel
  • BartForConditionalGeneration
  • Florence2ForConditionalGeneration
  • BertModel
  • RobertaModel
  • RobertaForMaskedLM
  • XLMRobertaModel
  • DeciLMForCausalLM
  • Gemma2Model
  • GlmForCausalLM
  • GritLM
  • InternLM2ForRewardModel
  • JambaForSequenceClassification
  • LlamaModel
  • MistralModel
  • Phi3ForCausalLM
  • Qwen2Model
  • Qwen2ForCausalLM
  • Qwen2ForRewardModel
  • Qwen2ForProcessRewardModel
  • TeleChat2ForCausalLM
  • LlavaNextForConditionalGeneration
  • Phi3VForCausalLM
  • Qwen2VLForConditionalGeneration
  • Qwen2ForSequenceClassification
  • BertForSequenceClassification
  • RobertaForSequenceClassification
  • XLMRobertaForSequenceClassification
  • AriaForConditionalGeneration
  • Blip2ForConditionalGeneration
  • ChameleonForConditionalGeneration
  • ChatGLMModel
  • ChatGLMForConditionalGeneration
  • DeepseekVLV2ForCausalLM
  • FuyuForCausalLM
  • H2OVLChatModel
  • InternVLChatModel
  • Idefics3ForConditionalGeneration
  • LlavaForConditionalGeneration
  • LlavaNextForConditionalGeneration
  • LlavaNextVideoForConditionalGeneration
  • LlavaOnevisionForConditionalGeneration
  • MantisForConditionalGeneration
  • MiniCPMO
  • MiniCPMV
  • MolmoForCausalLM
  • NVLM_D
  • PaliGemmaForConditionalGeneration
  • Phi3VForCausalLM
  • PixtralForConditionalGeneration
  • QWenLMHeadModel
  • Qwen2VLForConditionalGeneration
  • Qwen2_5_VLForConditionalGeneration
  • Qwen2AudioForConditionalGeneration
  • UltravoxModel
  • MllamaForConditionalGeneration
  • WhisperForConditionalGeneration
  • EAGLEModel
  • MedusaModel
  • MLPSpeculatorPreTrainedModel
Was this page helpful?
API DocsScaleway consoleDedibox consoleScaleway LearningScaleway.comPricingBlogCareers
© 2023-2025 – Scaleway