Models on kluster.ai#
kluster.ai offers a wide variety of open-source models for both real-time and batch inferences, with more being constantly added.
This page covers all the models the API supports, with the API request limits for each.
Model names#
Each model supported by kluster.ai has a unique name that must be used when defining the model
in the request.
Model | Model API name |
---|---|
DeepSeek-R1-0528 | deepseek-ai/DeepSeek-R1-0528 |
DeepSeek-V3-0324 | deepseek-ai/DeepSeek-V3-0324 |
Gemma 3 27B | google/gemma-3-27b-it |
Magistral Small | mistralai/Magistral-Small-2506 |
Meta Llama 3.1 8B | klusterai/Meta-Llama-3.1-8B-Instruct-Turbo |
Meta Llama 3.3 70B | klusterai/Meta-Llama-3.3-70B-Instruct-Turbo |
Meta Llama 4 Maverick | meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8 |
Meta Llama 4 Scout | meta-llama/Llama-4-Scout-17B-16E-Instruct |
Mistral NeMo | mistralai/Mistral-Nemo-Instruct-2407 |
Mistral Small | mistralai/Mistral-Small-24B-Instruct-2501 |
Qwen2.5-VL 7B | Qwen/Qwen2.5-VL-7B-Instruct |
Qwen3-235B-A22B | Qwen/Qwen3-235B-A22B-FP8 |
kluster reliability check | klusterai/verify-reliability |
Model comparison table#
Model | Description | Real-time inference |
Batch inference |
Tools | Fine-tuning | Image analysis |
---|---|---|---|---|---|---|
DeepSeek-R1-0528 | Mathematical problem-solving code generation complex data analysis. |
|||||
DeepSeek-V3-0324 | Natural language generation open-ended text creation contextually rich writing. |
|||||
Gemma 3 27B | Multilingual applications extended-context tasks image analysis and complex reasoning. |
|||||
Magistral Small | Reasoning Natural language generation open-ended text creation contextually rich writing. |
|||||
Llama 3.1 8B | Low-latency or simple tasks cost-efficient inference. |
|||||
Llama 3.3 70B | General-purpose AI balanced cost-performance. |
|||||
Llama 4 Maverick | A state-of-the-art multimodal model with integrated vision and language understanding, optimized for complex reasoning, coding, and perception tasks |
|||||
Llama 4 Scout | General-purpose multimodal AI extended context tasks and balanced cost-performance across text and vision. |
|||||
Mistral NeMo | Natural language generation open-ended text creation contextually rich writing. |
|||||
Mistral Small | Fast conversational agents local inference on consumer hardware domain-specific applications and low-latency function calling. |
|||||
Qwen2.5-VL 7B | Visual question answering document analysis image-based reasoning multimodal chat. |
|||||
Qwen3-235B-A22B | Qwen3's flagship 235 billion parameter model optimized with 8-bit quantization |
|||||
kluster reliability check | kluster.ai Verify is an advanced AI-powered fact-checking tool designed to identify inaccuracies and hallucinations in AI-generated text |
API request limits#
The following limits apply to API requests based on your plan:
Model | Context size [tokens] |
Max output [tokens] |
Max batch requests |
Concurrent requests |
Requests per minute |
Hosted fine-tuned models |
---|---|---|---|---|---|---|
DeepSeek-R1-0528 | 32k | 4k | 1000 | 20 | 30 | 1 |
DeepSeek-V3-0324 | 32k | 4k | 1000 | 20 | 30 | 1 |
Gemma 3 27B | 32k | 4k | 1000 | 20 | 30 | 1 |
Magistral Small | 32k | 4k | 1000 | 20 | 30 | 1 |
Meta Llama 3.1 8B | 32k | 4k | 1000 | 20 | 30 | 1 |
Meta Llama 3.3 70B | 32k | 4k | 1000 | 20 | 30 | 1 |
Meta Llama 4 Maverick | 32k | 4k | 1000 | 20 | 30 | 1 |
Meta Llama 4 Scout | 32k | 4k | 1000 | 20 | 30 | 1 |
Mistral NeMo | 32k | 4k | 1000 | 20 | 30 | 1 |
Mistral Small | 32k | 4k | 1000 | 20 | 30 | 1 |
Qwen2.5-VL 7B | 32k | 4k | 1000 | 20 | 30 | 1 |
Qwen3-235B-A22B | 32k | 4k | 1000 | 20 | 30 | 1 |
kluster reliability check | 32k | 4k | 1000 | 20 | 30 | 1 |
Model | Context size [tokens] |
Max output [tokens] |
Max batch requests |
Concurrent requests |
Requests per minute |
Hosted fine-tuned models |
---|---|---|---|---|---|---|
DeepSeek-R1-0528 | 163k | 163k | 100k | 100 | 600 | 10 |
DeepSeek-V3-0324 | 163k | 163k | 100k | 100 | 600 | 10 |
Gemma 3 27B | 64k | 8k | 100k | 100 | 600 | 10 |
Magistral Small | 40k | 40k | 100k | 100 | 600 | 10 |
Meta Llama 3.1 8B | 131k | 131k | 100k | 100 | 600 | 10 |
Meta Llama 3.3 70B | 131k | 131k | 100k | 100 | 600 | 10 |
Meta Llama 4 Maverick | 1M | 1M | 100k | 100 | 600 | 10 |
Meta Llama 4 Scout | 131k | 131k | 100k | 100 | 600 | 10 |
Mistral NeMo | 131k | 131k | 100k | 100 | 600 | 10 |
Mistral Small | 32k | 32k | 100k | 100 | 600 | 10 |
Qwen2.5-VL 7B | 32k | 32k | 100k | 100 | 600 | 10 |
Qwen3-235B-A22B | 40k | 40k | 100k | 100 | 600 | 10 |
kluster reliability check | 100k | 0 | 100k | 100 | 600 | 10 |
Model | Context size [tokens] |
Max output [tokens] |
Max batch requests |
Concurrent requests |
Requests per minute |
Hosted fine-tuned models |
---|---|---|---|---|---|---|
DeepSeek-R1-0528 | 163k | 163k | 500k | 100 | 1200 | 25 |
DeepSeek-V3-0324 | 163k | 163k | 500k | 100 | 1200 | 25 |
Gemma 3 27B | 64k | 8k | 500k | 100 | 1200 | 25 |
Magistral Small | 40k | 40k | 500k | 100 | 1200 | 25 |
Meta Llama 3.1 8B | 131k | 131k | 500k | 100 | 1200 | 25 |
Meta Llama 3.3 70B | 131k | 131k | 500k | 100 | 1200 | 25 |
Meta Llama 4 Maverick | 1M | 1M | 500k | 100 | 1200 | 25 |
Meta Llama 4 Scout | 131k | 131k | 500k | 100 | 1200 | 25 |
Mistral NeMo | 131k | 131k | 500k | 100 | 1200 | 25 |
Mistral Small | 32k | 32k | 500k | 100 | 1200 | 25 |
Qwen2.5-VL 7B | 32k | 32k | 500k | 100 | 1200 | 25 |
Qwen3-235B-A22B | 40k | 40k | 500k | 100 | 1200 | 25 |
kluster reliability check | 100k | 0 | 500k | 100 | 1200 | 25 |
Model | Context size [tokens] |
Max output [tokens] |
Max batch requests |
Concurrent requests |
Requests per minute |
Hosted fine-tuned models |
---|---|---|---|---|---|---|
DeepSeek-R1-0528 | 163k | 163k | Unlimited | 100 | Unlimited | Unlimited |
DeepSeek-V3-0324 | 163k | 163k | Unlimited | 100 | Unlimited | Unlimited |
Gemma 3 27B | 64k | 8k | Unlimited | 100 | Unlimited | Unlimited |
Magistral Small | 40k | 40k | Unlimited | 100 | Unlimited | Unlimited |
Meta Llama 3.1 8B | 131k | 131k | Unlimited | 100 | Unlimited | Unlimited |
Meta Llama 3.3 70B | 131k | 131k | Unlimited | 100 | Unlimited | Unlimited |
Meta Llama 4 Maverick | 1M | 1M | Unlimited | 100 | Unlimited | Unlimited |
Meta Llama 4 Scout | 131k | 131k | Unlimited | 100 | Unlimited | Unlimited |
Mistral NeMo | 131k | 131k | Unlimited | 100 | Unlimited | Unlimited |
Mistral Small | 32k | 32k | Unlimited | 100 | Unlimited | Unlimited |
Qwen2.5-VL 7B | 32k | 32k | Unlimited | 100 | Unlimited | Unlimited |
Qwen3-235B-A22B | 40k | 40k | Unlimited | 100 | Unlimited | Unlimited |
kluster reliability check | 100k | 0 | Unlimited | 100 | Unlimited | Unlimited |