Custom model support is currently in beta. If you encounter issues or limitations, please report them via our Slack community channel or customer support.
Supported models in Managed Inference
Scaleway Managed Inference allows you to deploy various AI models, either from:
- Scaleway catalog: A curated set of ready-to-deploy models available through the Scaleway console or the Managed Inference models API
- Custom models: Models that you import, typically from sources like Hugging Face.
Scaleway catalogLink to this anchor
Multimodal models (chat + vision)Link to this anchor
More details to be added.
Chat modelsLink to this anchor
Provider | Model identifier | Documentation | License |
---|---|---|---|
Allen AI | molmo-72b-0924 | View Details | Apache 2.0 license |
Deepseek | deepseek-r1-distill-llama-70b | View Details | MIT license |
Deepseek | deepseek-r1-distill-llama-8b | View Details | MIT license |
Meta | llama-3-70b-instruct | View Details | Llama 3 license |
Meta | llama-3-8b-instruct | View Details | Llama 3 license |
Meta | llama-3.1-70b-instruct | View Details | Llama 3.1 community license |
Meta | llama-3.1-8b-instruct | View Details | Llama 3.1 license |
Meta | llama-3.3-70b-instruct | View Details | Llama 3.3 license |
Nvidia | llama-3.1-nemotron-70b-instruct | View Details | Llama 3.1 community license |
Mistral | mixtral-8x7b-instruct-v0.1 | View Details | Apache 2.0 license |
Mistral | mistral-7b-instruct-v0.3 | View Details | Apache 2.0 license |
Mistral | mistral-nemo-instruct-2407 | View Details | Apache 2.0 license |
Mistral | mistral-small-24b-instruct-2501 | View Details | Apache 2.0 license |
Mistral | pixtral-12b-2409 | View Details | Apache 2.0 license |
Qwen | qwen2.5-coder-32b-instruct | View Details | Apache 2.0 license |
Vision modelsLink to this anchor
More details to be added.
Embedding modelsLink to this anchor
Provider | Model identifier | Documentation | License |
---|---|---|---|
BAAI | bge-multilingual-gemma2 | View Details | Gemma Terms of Use |
Sentence Transformers | sentence-t5-xxl | View Details | Apache 2.0 license |
Custom modelsLink to this anchor
PrerequisitesLink to this anchor
We recommend starting with a variation of a supported model from the Scaleway catalog. For example, you can deploy a quantized (4-bit) version of Llama 3.3. If deploying a fine-tuned version of Llama 3.3, make sure your file structure matches the example linked above.
To deploy a custom model via Hugging Face, ensure the following:
Access requirements
- You must have access to the model using your Hugging Face credentials.
- For gated models, request access through your Hugging Face account.
- Credentials are not stored, but we recommend using read or fine-grained access tokens.
Required files
Your model repository must include:
- A
config.json
file containig:- An
architectures
array (see supported architectures for the exact list of supported values). max_position_embeddings
- An
- Model weights in the
.safetensors
format - A chat template included in either:
tokenizer_config.json
as achat_template
field, orchat_template.json
as achat_template
field
Supported model types
Your model must be one of the following types:
chat
vision
multimodal
(chat + vision)embedding
Security Notice
Models using formats that allow arbitrary code execution, such as Python pickle
, are not supported.
API supportLink to this anchor
Depending on the model type, specific endpoints and features will be supported.
Chat modelsLink to this anchor
The Chat API will be exposed for this model under /v1/chat/completions
endpoint.
Structured outputs or Function calling are not yet supported for custom models.
Vision modelsLink to this anchor
Chat API will be exposed for this model under /v1/chat/completions
endpoint.
Structured outputs or Function calling are not yet supported for custom models.
Multimodal modelsLink to this anchor
These models will be treated similarly to both Chat and Vision models.
Embedding modelsLink to this anchor
Embeddings API will be exposed for this model under /v1/embeddings
endpoint.
Custom model lifecycleLink to this anchor
Currently, custom model deployments are considered to be valid for the long term, and we will ensure any updates or changes to Managed Inference will not impact existing deployments. In case of breaking changes, leading to some custom models not being supported anymore, we will notify you at least 3 months beforehand.
LicensingLink to this anchor
When deploying custom models, you remain responsible for complying with any License requirements from the model provider, as you would do by running the model on a custom provisioned GPU.
Supported model architecturesLink to this anchor
Custom models must conform to one of the architectures listed below. Click to expand full list.
Supported custom model architectures
Custom model deployment currently supports the following model architectures:
AquilaModel
AquilaForCausalLM
ArcticForCausalLM
BaiChuanForCausalLM
BaichuanForCausalLM
BloomForCausalLM
CohereForCausalLM
Cohere2ForCausalLM
DbrxForCausalLM
DeciLMForCausalLM
DeepseekForCausalLM
DeepseekV2ForCausalLM
DeepseekV3ForCausalLM
ExaoneForCausalLM
FalconForCausalLM
Fairseq2LlamaForCausalLM
GemmaForCausalLM
Gemma2ForCausalLM
GlmForCausalLM
GPT2LMHeadModel
GPTBigCodeForCausalLM
GPTJForCausalLM
GPTNeoXForCausalLM
GraniteForCausalLM
GraniteMoeForCausalLM
GritLM
InternLMForCausalLM
InternLM2ForCausalLM
InternLM2VEForCausalLM
InternLM3ForCausalLM
JAISLMHeadModel
JambaForCausalLM
LlamaForCausalLM
LLaMAForCausalLM
MambaForCausalLM
FalconMambaForCausalLM
MiniCPMForCausalLM
MiniCPM3ForCausalLM
MistralForCausalLM
MixtralForCausalLM
QuantMixtralForCausalLM
MptForCausalLM
MPTForCausalLM
NemotronForCausalLM
OlmoForCausalLM
Olmo2ForCausalLM
OlmoeForCausalLM
OPTForCausalLM
OrionForCausalLM
PersimmonForCausalLM
PhiForCausalLM
Phi3ForCausalLM
Phi3SmallForCausalLM
PhiMoEForCausalLM
Qwen2ForCausalLM
Qwen2MoeForCausalLM
RWForCausalLM
StableLMEpochForCausalLM
StableLmForCausalLM
Starcoder2ForCausalLM
SolarForCausalLM
TeleChat2ForCausalLM
XverseForCausalLM
BartModel
BartForConditionalGeneration
Florence2ForConditionalGeneration
BertModel
RobertaModel
RobertaForMaskedLM
XLMRobertaModel
DeciLMForCausalLM
Gemma2Model
GlmForCausalLM
GritLM
InternLM2ForRewardModel
JambaForSequenceClassification
LlamaModel
MistralModel
Phi3ForCausalLM
Qwen2Model
Qwen2ForCausalLM
Qwen2ForRewardModel
Qwen2ForProcessRewardModel
TeleChat2ForCausalLM
LlavaNextForConditionalGeneration
Phi3VForCausalLM
Qwen2VLForConditionalGeneration
Qwen2ForSequenceClassification
BertForSequenceClassification
RobertaForSequenceClassification
XLMRobertaForSequenceClassification
AriaForConditionalGeneration
Blip2ForConditionalGeneration
ChameleonForConditionalGeneration
ChatGLMModel
ChatGLMForConditionalGeneration
DeepseekVLV2ForCausalLM
FuyuForCausalLM
H2OVLChatModel
InternVLChatModel
Idefics3ForConditionalGeneration
LlavaForConditionalGeneration
LlavaNextForConditionalGeneration
LlavaNextVideoForConditionalGeneration
LlavaOnevisionForConditionalGeneration
MantisForConditionalGeneration
MiniCPMO
MiniCPMV
MolmoForCausalLM
NVLM_D
PaliGemmaForConditionalGeneration
Phi3VForCausalLM
PixtralForConditionalGeneration
QWenLMHeadModel
Qwen2VLForConditionalGeneration
Qwen2_5_VLForConditionalGeneration
Qwen2AudioForConditionalGeneration
UltravoxModel
MllamaForConditionalGeneration
WhisperForConditionalGeneration
EAGLEModel
MedusaModel
MLPSpeculatorPreTrainedModel