NavigationContentFooter
Jump toSuggest an edit
Was this page helpful?

Managed Inference model catalog

Reviewed on 18 April 2025Published on 18 April 2024

A quick overview of available models in Scaleway’s catalog and their core attributes. Expand any model below to see usage examples, curl commands, and detailed capabilities.

Models technical summaryLink to this anchor

Model nameProviderMaximum Context length (tokens)ModalitiesInstancesLicense
gemma-3-27b-itGoogle40kText, VisionH100, H100-2Gemma
llama-3.3-70b-instructMeta128kTextH100, H100-2Llama 3.3 community
llama-3.1-70b-instructMeta128kTextH100, H100-2Llama 3.1 community
llama-3.1-8b-instructMeta128kTextL4, L40S, H100, H100-2Llama 3.1 community
llama-3-70b-instructMeta8kTextH100, H100-2Llama 3 community
llama-3.1-nemotron-70b-instructNvidia128kTextH100, H100-2Llama 3.1 community
deepseek-r1-distill-70bDeepseek128kTextH100, H100-2MIT and Llama 3.3 Community
deepseek-r1-distill-8bDeepseek128kTextL4, L40S, H100, H100-2MIT and Llama 3.1 Community
mistral-7b-instruct-v0.3Mistral32kTextL4, L40S, H100, H100-2Apache 2.0
mistral-small-3.1-24b-instruct-2503Mistral128kText, VisionH100, H100-2Apache 2.0
mistral-small-24b-instruct-2501Mistral32kTextL40S, H100, H100-2Apache 2.0
mistral-nemo-instruct-2407Mistral128kTextL40S, H100, H100-2Apache 2.0
mixtral-8x7b-instruct-v0.1Mistral32kTextH100-2Apache 2.0
moshiko-0.1-8bKyutai4kAudio to AudioL4, H100CC-BY-4.0
moshika-0.1-8bKyutai4kAudio to AudioL4, H100CC-BY-4.0
pixtral-12b-2409Mistral128kText, VisionL40S, H100, H100-2Apache 2.0
molmo-72b-0924Allen AI50kText, VisionH100-2Apache 2.0 and Twonyi Qianwen license
qwen2.5-coder-32b-instructQwen32kCodeH100, H100-2Apache 2.0
bge-multilingual-gemma2BAAI4kEmbeddingsL4, L40S, H100, H100-2Gemma
sentence-t5-xxlSentence transformers512EmbeddingsL4Apache 2.0

Models feature summaryLink to this anchor

Model nameStructured output supportedFunction callingSupported languages
gemma-3-27b-itYesPartialEnglish, Chinese, Japanese, Korean and 31 additional languages
llama-3.3-70b-instructYesYesEnglish, German, French, Italian, Portuguese, Hindi, Spanish, and Thai
llama-3.1-70b-instructYesYesEnglish, German, French, Italian, Portuguese, Hindi, Spanish, and Thai
llama-3.1-8b-instructYesYesEnglish, German, French, Italian, Portuguese, Hindi, Spanish, and Thai
llama-3-70b-instructYesYesEnglish
llama-3.1-nemotron-70b-instructYesYesEnglish
deepseek-r1-distill-llama-70BYesYesEnglish, Chinese
deepseek-r1-distill-llama-8BYesYesEnglish, Chinese
mistral-7b-instruct-v0.3YesYesEnglish
mistral-small-3.1-24b-instruct-2503YesYesEnglish, French, German, Greek, Hindi, Indonesian, Italian, Japanese, Korean, Malay, Nepali, Polish, Portuguese, Romanian, Russian, Serbian, Spanish, Swedish, Turkish, Ukrainian, Vietnamese, Arabic, Bengali, Chinese, Farsi
mistral-small-24b-instruct-2501YesYesText
mistral-nemo-instruct-2407YesYesEnglish, French, German, Spanish, Italian, Portuguese, Russian, Chinese, Japanese
mixtral-8x7b-instruct-v0.1YesYesEnglish, French, German, Italian, Spanish
moshiko-0.1-8bNoNoEnglish
moshika-0.1-8bNoNoEnglish
pixtral-12b-2409YesYesEnglish
molmo-72b-0924YesNoEnglish
qwen2.5-coder-32b-instructYesYesEnglish, French, Spanish, Portuguese, German, Italian, Russian, Chinese, Japanese, Korean, Vietnamese, Thai, Arabic and 16 additional languages.
bge-multilingual-gemma2NoNoEnglish, French, Chinese, Japanese, Korean
sentence-t5-xxlNoNoEnglish

Model detailsLink to this anchor

Note

Despite efforts for accuracy, the possibility of generated text containing inaccuracies or hallucinations exists. Always verify the content generated independently.

Multimodal models (Text and Vision)Link to this anchor

Note

Vision models can understand and analyze images, not generate them. You will use it through the /v1/chat/completions endpoint.

Gemma-3-27b-itLink to this anchor

Gemma-3-27b-it is a model developed by Google to perform text processing and image analysis on many languages. The model was not trained specifically to output function / tool call tokens. Hence function calling is currently supported, but reliability remains limited.

Model names

google/gemma-3-27b-it:bf16

Mistral-small-3.1-24b-instruct-2503Link to this anchor

Mistral-small-3.1-24b-instruct-2503 is a model developed by Mistral to perform text processing and image analysis on many languages. This model was optimized to have a dense knowledge and faster tokens throughput compared to its size.

Model names

mistral/mistral-small-3.1-24b-instruct-2503:bf16

Pixtral-12b-2409Link to this anchor

Pixtral is a vision language model introducing a novel architecture: 12B parameter multimodal decoder plus 400M parameter vision encoder. It can analyze images and offer insights from visual content alongside text. This multimodal functionality creates new opportunities for applications that need both visual and textual comprehension. Pixtral is open-weight and distributed under the Apache 2.0 license.

Model name

mistral/pixtral-12b-2409:bf16

Molmo-72b-0924Link to this anchor

Molmo 72B is the powerhouse of the Molmo family, multimodal models developed by the renowned research lab Allen Institute for AI. Vision-language models like Molmo can analyze an image and offer insights from visual content alongside text. This multimodal functionality creates new opportunities for applications that need both visual and textual comprehension.

Model name

allenai/molmo-72b-0924:fp8

Text modelsLink to this anchor

Llama-3.3-70b-instructLink to this anchor

Released December 6, 2024, Meta’s Llama 3.3 70b is a fine-tune of the Llama 3.1 70b model. This model is still text-only (text in/text out). However, Llama 3.3 was designed to approach the performance of Llama 3.1 405B on some applications.

Model name

meta/llama-3.3-70b-instruct:fp8
meta/llama-3.3-70b-instruct:bf16

Llama-3.1-70b-instructLink to this anchor

Released July 23, 2024, Meta’s Llama 3.1 is an iteration of the open-access Llama family. Llama 3.1 was designed to match the best proprietary models and outperform many of the available open source on common industry benchmarks.

Model names

meta/llama-3.1-70b-instruct:fp8
meta/llama-3.1-70b-instruct:bf16

Llama-3.1-8b-instructLink to this anchor

Released July 23, 2024, Meta’s Llama 3.1 is an iteration of the open-access Llama family. Llama 3.1 was designed to match the best proprietary models and outperform many of the available open source on common industry benchmarks.

Model names

meta/llama-3.1-8b-instruct:fp8
meta/llama-3.1-8b-instruct:bf16

Llama-3-70b-instructLink to this anchor

Meta’s Llama 3 is an iteration of the open-access Llama family. Llama 3 was designed to match the best proprietary models, enhanced by community feedback for greater utility and responsibly spearheading the deployment of LLMs. With a commitment to open-source principles, this release marks the beginning of a multilingual, multimodal future for Llama 3, pushing the boundaries in reasoning and coding capabilities.

Model name

meta/llama-3-70b-instruct:fp8

Llama-3.1-Nemotron-70b-instructLink to this anchor

Introduced October 14, 2024, NVIDIA’s Nemotron 70B Instruct is a specialized version of the Llama 3.1 model designed to follow complex instructions. NVIDIA employed Reinforcement Learning from Human Feedback (RLHF) to fine-tune the model’s ability to generate relevant and informative responses.

Model name

nvidia/llama-3.1-nemotron-70b-instruct:fp8

DeepSeek-R1-Distill-Llama-70BLink to this anchor

Released January 21, 2025, Deepseek’s R1 Distilled Llama 70B is a distilled version of the Llama model family based on Deepseek R1. DeepSeek R1 Distill Llama 70B is designed to improve the performance of Llama models on reasoning use cases such as mathematics and coding tasks.

Model name

deepseek/deepseek-r1-distill-llama-70b:fp8
deepseek/deepseek-r1-distill-llama-70b:bf16

DeepSeek-R1-Distill-Llama-8BLink to this anchor

Released January 21, 2025, Deepseek’s R1 Distilled Llama 8B is a distilled version of the Llama model family based on Deepseek R1. DeepSeek R1 Distill Llama 8B is designed to improve the performance of Llama models on reasoning use cases such as mathematics and coding tasks.

Model names

deepseek/deepseek-r1-distill-llama-8b:fp8
deepseek/deepseek-r1-distill-llama-8b:bf16

Mixtral-8x7b-instruct-v0.1Link to this anchor

Mixtral-8x7b-instruct-v0.1, developed by Mistral, is tailored for instructional platforms and virtual assistants. Trained on vast instructional datasets, it provides clear and concise instructions across various domains, enhancing user learning experiences.

Model names

mistral/mixtral-8x7b-instruct-v0.1:fp8
mistral/mixtral-8x7b-instruct-v0.1:bf16

Mistral-7b-instruct-v0.3Link to this anchor

The first dense model released by Mistral AI, perfect for experimentation, customization, and quick iteration. At the time of the release, it matched the capabilities of models up to 30B parameters. This model is open-weight and distributed under the Apache 2.0 license.

Model name

mistral/mistral-7b-instruct-v0.3:bf16

Mistral-small-24b-instruct-2501Link to this anchor

Mistral Small 24B Instruct is a state-of-the-art transformer model of 24B parameters, built by Mistral. This model is open-weight and distributed under the Apache 2.0 license.

Model name

mistral/mistral-small-24b-instruct-2501:fp8
mistral/mistral-small-24b-instruct-2501:bf16

Mistral-nemo-instruct-2407Link to this anchor

Mistral Nemo is a state-of-the-art transformer model of 12B parameters, built by Mistral in collaboration with NVIDIA. This model is open-weight and distributed under the Apache 2.0 license. It was trained on a large proportion of multilingual and code data.

Model name

mistral/mistral-nemo-instruct-2407:fp8

Moshiko-0.1-8bLink to this anchor

Kyutai’s Moshi is a speech-text foundation model for real-time dialogue. Moshi is an experimental next-generation conversational model, designed to understand and respond fluidly and naturally to complex conversations, while providing unprecedented expressiveness and spontaneity. While current systems for spoken dialogue rely on a pipeline of separate components, Moshi is the first real-time full-duplex spoken large language model. Moshiko is the variant of Moshi with a male voice in English.

Model names

kyutai/moshiko-0.1-8b:bf16
kyutai/moshiko-0.1-8b:fp8

Moshika-0.1-8bLink to this anchor

Kyutai’s Moshi is a speech-text foundation model for real-time dialogue. Moshi is an experimental next-generation conversational model, designed to understand and respond fluidly and naturally to complex conversations, while providing unprecedented expressiveness and spontaneity. While current systems for spoken dialogue rely on a pipeline of separate components, Moshi is the first real-time full-duplex spoken large language model. Moshika is the variant of Moshi with a female voice in English.

Model names

kyutai/moshika-0.1-8b:bf16
kyutai/moshika-0.1-8b:fp8

Code modelsLink to this anchor

Qwen2.5-coder-32b-instructLink to this anchor

Qwen2.5-coder is your intelligent programming assistant familiar with more than 40 programming languages. With Qwen2.5-coder deployed at Scaleway, your company can benefit from code generation, AI-assisted code repair, and code reasoning.

Model name

qwen/qwen2.5-coder-32b-instruct:int8

Embeddings modelsLink to this anchor

Bge-multilingual-gemma2Link to this anchor

BGE-Multilingual-Gemma2 tops the MTEB leaderboard, scoring the number one spot in French and Polish, and number seven in English (as of Q4 2024). As its name suggests, the model’s training data spans a broad range of languages, including English, Chinese, Polish, French, and more.

AttributeValue
Embedding dimensions3584
Matryoshka embeddingNo
Note

Matryoshka embeddings refers to embeddings trained on multiple dimensions number. As a result, resulting vectors dimensions will be sorted by most meaningful first. For example, a 3584 dimensions vector can be truncated to its 768 first dimensions and used directly.

Model name

baai/bge-multilingual-gemma2:fp32

Sentence-t5-xxlLink to this anchor

The Sentence-T5-XXL model represents a significant evolution in sentence embeddings, building on the robust foundation of the Text-To-Text Transfer Transformer (T5) architecture. Designed for performance in various language processing tasks, Sentence-T5-XXL leverages the strengths of T5’s encoder-decoder structure to generate high-dimensional vectors that encapsulate rich semantic information. This model has been meticulously tuned for tasks such as text classification, semantic similarity, and clustering, making it a useful tool in the RAG (Retrieval-Augmented Generation) framework. It excels in sentence similarity tasks, but its performance in semantic search tasks is less optimal.

AttributeValue
Embedding dimensions768
Matryoshka embeddingNo

Model name

sentence-transformers/sentence-t5-xxl:fp32
Was this page helpful?
API DocsScaleway consoleDedibox consoleScaleway LearningScaleway.comPricingBlogCareers
© 2023-2025 – Scaleway