Models on kluster.ai#
kluster.ai offers a wide variety of open-source models for both real-time and batch inferences, with more being constantly added.
This page covers all the models the API supports, with the API request limits for each.
Model names#
Each model supported by kluster.ai has a unique name that must be used when defining the model
in the request.
Model | Model API name |
---|---|
DeepSeek-R1 | deepseek-ai/DeepSeek-R1 |
DeepSeek-R1-0528 | deepseek-ai/DeepSeek-R1-0528 |
DeepSeek-V3-0324 | deepseek-ai/DeepSeek-V3-0324 |
Gemma 3 27B | google/gemma-3-27b-it |
Meta Llama 3.1 8B | klusterai/Meta-Llama-3.1-8B-Instruct-Turbo |
Meta Llama 3.3 70B | klusterai/Meta-Llama-3.3-70B-Instruct-Turbo |
Meta Llama 4 Maverick | meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8 |
Meta Llama 4 Scout | meta-llama/Llama-4-Scout-17B-16E-Instruct |
Mistral NeMo | mistralai/Mistral-Nemo-Instruct-2407 |
Qwen2.5-VL 7B | Qwen/Qwen2.5-VL-7B-Instruct |
Qwen3-235B-A22B | Qwen/Qwen3-235B-A22B-FP8 |
Model comparison table#
Model | Description | Real-time inference support |
Batch inference support |
Fine-tuning support |
Image analysis |
---|---|---|---|---|---|
DeepSeek-R1 | Mathematical problem-solving code generation complex data analysis. |
||||
DeepSeek-R1-0528 | Mathematical problem-solving code generation complex data analysis. |
||||
DeepSeek-V3-0324 | Natural language generation open-ended text creation contextually rich writing. |
||||
Gemma 3 27B | Multilingual applications extended-context tasks image analysis and complex reasoning. |
||||
Llama 3.1 8B | Low-latency or simple tasks cost-efficient inference. |
||||
Llama 3.3 70B | General-purpose AI balanced cost-performance. |
||||
Llama 4 Maverick | A state-of-the-art multimodal model with integrated vision and language understanding, optimized for complex reasoning, coding, and perception tasks |
||||
Llama 4 Scout | General-purpose multimodal AI extended context tasks and balanced cost-performance across text and vision. |
||||
Mistral NeMo | Natural language generation open-ended text creation contextually rich writing. |
||||
Qwen2.5-VL 7B | Visual question answering document analysis image-based reasoning multimodal chat. |
||||
Qwen3-235B-A22B | Qwen3's flagship 235 billion parameter model optimized with 8-bit quantization |
API request limits#
The following limits apply to API requests based on your plan:
Model | Context size [tokens] |
Max output [tokens] |
Max batch requests |
Concurrent requests |
Requests per minute |
Hosted fine-tuned models |
---|---|---|---|---|---|---|
DeepSeek-R1 | 32k | 4k | 1000 | 20 | 30 | 1 |
DeepSeek-R1-0528 | 32k | 4k | 1000 | 20 | 30 | 1 |
DeepSeek-V3-0324 | 32k | 4k | 1000 | 20 | 30 | 1 |
Gemma 3 27B | 32k | 4k | 1000 | 20 | 30 | 1 |
Meta Llama 3.1 8B | 32k | 4k | 1000 | 20 | 30 | 1 |
Meta Llama 3.3 70B | 32k | 4k | 1000 | 20 | 30 | 1 |
Meta Llama 4 Maverick | 32k | 4k | 1000 | 20 | 30 | 1 |
Meta Llama 4 Scout | 32k | 4k | 1000 | 20 | 30 | 1 |
Mistral NeMo | 32k | 4k | 1000 | 20 | 30 | 1 |
Qwen2.5-VL 7B | 32k | 4k | 1000 | 20 | 30 | 1 |
Qwen3-235B-A22B | 32k | 4k | 1000 | 20 | 30 | 1 |
Model | Context size [tokens] |
Max output [tokens] |
Max batch requests |
Concurrent requests |
Requests per minute |
Hosted fine-tuned models |
---|---|---|---|---|---|---|
DeepSeek-R1 | 163k | 163k | 100k | 100 | 600 | 10 |
DeepSeek-R1-0528 | 163k | 163k | 100k | 100 | 600 | 10 |
DeepSeek-V3-0324 | 163k | 163k | 100k | 100 | 600 | 10 |
Gemma 3 27B | 64k | 8k | 100k | 100 | 600 | 10 |
Meta Llama 3.1 8B | 131k | 131k | 100k | 100 | 600 | 10 |
Meta Llama 3.3 70B | 131k | 131k | 100k | 100 | 600 | 10 |
Meta Llama 4 Maverick | 1M | 1M | 100k | 100 | 600 | 10 |
Meta Llama 4 Scout | 131k | 131k | 100k | 100 | 600 | 10 |
Mistral NeMo | 131k | 131k | 100k | 100 | 600 | 10 |
Qwen2.5-VL 7B | 32k | 32k | 100k | 100 | 600 | 10 |
Qwen3-235B-A22B | 40k | 40k | 100k | 100 | 600 | 10 |
Model | Context size [tokens] |
Max output [tokens] |
Max batch requests |
Concurrent requests |
Requests per minute |
Hosted fine-tuned models |
---|---|---|---|---|---|---|
DeepSeek-R1 | 163k | 163k | 500k | 100 | 1200 | 25 |
DeepSeek-R1-0528 | 163k | 163k | 500k | 100 | 1200 | 25 |
DeepSeek-V3-0324 | 163k | 163k | 500k | 100 | 1200 | 25 |
Gemma 3 27B | 64k | 8k | 500k | 100 | 1200 | 25 |
Meta Llama 3.1 8B | 131k | 131k | 500k | 100 | 1200 | 25 |
Meta Llama 3.3 70B | 131k | 131k | 500k | 100 | 1200 | 25 |
Meta Llama 4 Maverick | 1M | 1M | 500k | 100 | 1200 | 25 |
Meta Llama 4 Scout | 131k | 131k | 500k | 100 | 1200 | 25 |
Mistral NeMo | 131k | 131k | 500k | 100 | 1200 | 25 |
Qwen2.5-VL 7B | 32k | 32k | 500k | 100 | 1200 | 25 |
Qwen3-235B-A22B | 40k | 40k | 500k | 100 | 1200 | 25 |
Model | Context size [tokens] |
Max output [tokens] |
Max batch requests |
Concurrent requests |
Requests per minute |
Hosted fine-tuned models |
---|---|---|---|---|---|---|
DeepSeek-R1 | 163k | 163k | Unlimited | 100 | Unlimited | Unlimited |
DeepSeek-R1-0528 | 163k | 163k | Unlimited | 100 | Unlimited | Unlimited |
DeepSeek-V3-0324 | 163k | 163k | Unlimited | 100 | Unlimited | Unlimited |
Gemma 3 27B | 64k | 8k | Unlimited | 100 | Unlimited | Unlimited |
Meta Llama 3.1 8B | 131k | 131k | Unlimited | 100 | Unlimited | Unlimited |
Meta Llama 3.3 70B | 131k | 131k | Unlimited | 100 | Unlimited | Unlimited |
Meta Llama 4 Maverick | 1M | 1M | Unlimited | 100 | Unlimited | Unlimited |
Meta Llama 4 Scout | 131k | 131k | Unlimited | 100 | Unlimited | Unlimited |
Mistral NeMo | 131k | 131k | Unlimited | 100 | Unlimited | Unlimited |
Qwen2.5-VL 7B | 32k | 32k | Unlimited | 100 | Unlimited | Unlimited |
Qwen3-235B-A22B | 40k | 40k | Unlimited | 100 | Unlimited | Unlimited |