Skip to content

Models on kluster.ai

kluster.ai offers a wide variety of open-source models for both real-time and batch inferences, with more being constantly added.

This page covers all the models the API supports, with the API request limits for each.

Model names

Each model supported by kluster.ai has a unique name that must be used when defining the model in the request.

Model Model API name
DeepSeek-R1 deepseek-ai/DeepSeek-R1
DeepSeek-R1-0528 deepseek-ai/DeepSeek-R1-0528
DeepSeek-V3-0324 deepseek-ai/DeepSeek-V3-0324
Gemma 3 27B google/gemma-3-27b-it
Meta Llama 3.1 8B klusterai/Meta-Llama-3.1-8B-Instruct-Turbo
Meta Llama 3.3 70B klusterai/Meta-Llama-3.3-70B-Instruct-Turbo
Meta Llama 4 Maverick meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8
Meta Llama 4 Scout meta-llama/Llama-4-Scout-17B-16E-Instruct
Mistral NeMo mistralai/Mistral-Nemo-Instruct-2407
Qwen2.5-VL 7B Qwen/Qwen2.5-VL-7B-Instruct
Qwen3-235B-A22B Qwen/Qwen3-235B-A22B-FP8

Model comparison table

Model Description Real-time
inference support
Batch
inference support
Fine-tuning
support
Image
analysis
DeepSeek-R1 Mathematical problem-solving
code generation
complex data analysis.
✅ ✅ ❌ ❌
DeepSeek-R1-0528 Mathematical problem-solving
code generation
complex data analysis.
✅ ✅ ❌ ❌
DeepSeek-V3-0324 Natural language generation
open-ended text creation
contextually rich writing.
✅ ✅ ❌ ❌
Gemma 3 27B Multilingual applications
extended-context tasks
image analysis
and complex reasoning.
✅ ✅ ❌ ✅
Llama 3.1 8B Low-latency or simple tasks
cost-efficient inference.
✅ ✅ ✅ ❌
Llama 3.3 70B General-purpose AI
balanced cost-performance.
✅ ✅ ✅ ❌
Llama 4 Maverick A state-of-the-art multimodal
model with integrated vision
and language understanding,
optimized for complex
reasoning, coding, and
perception tasks
✅ ✅ ❌ ✅
Llama 4 Scout General-purpose multimodal AI
extended context tasks
and balanced cost-performance across text and vision.
✅ ✅ ❌ ✅
Mistral NeMo Natural language generation
open-ended text creation
contextually rich writing.
✅ ✅ ❌ ❌
Qwen2.5-VL 7B Visual question answering
document analysis
image-based reasoning
multimodal chat.
✅ ✅ ❌ ✅
Qwen3-235B-A22B Qwen3's flagship 235 billion
parameter model optimized with
8-bit quantization
✅ ✅ ❌ ❌

API request limits

The following limits apply to API requests based on your plan:

Model Context size
[tokens]
Max output
[tokens]
Max batch
requests
Concurrent
requests
Requests
per minute
Hosted fine-tuned
models
DeepSeek-R1 32k 4k 1000 20 30 1
DeepSeek-R1-0528 32k 4k 1000 20 30 1
DeepSeek-V3-0324 32k 4k 1000 20 30 1
Gemma 3 27B 32k 4k 1000 20 30 1
Meta Llama 3.1 8B 32k 4k 1000 20 30 1
Meta Llama 3.3 70B 32k 4k 1000 20 30 1
Meta Llama 4 Maverick 32k 4k 1000 20 30 1
Meta Llama 4 Scout 32k 4k 1000 20 30 1
Mistral NeMo 32k 4k 1000 20 30 1
Qwen2.5-VL 7B 32k 4k 1000 20 30 1
Qwen3-235B-A22B 32k 4k 1000 20 30 1
Model Context size
[tokens]
Max output
[tokens]
Max batch
requests
Concurrent
requests
Requests
per minute
Hosted fine-tuned
models
DeepSeek-R1 163k 163k 100k 100 600 10
DeepSeek-R1-0528 163k 163k 100k 100 600 10
DeepSeek-V3-0324 163k 163k 100k 100 600 10
Gemma 3 27B 64k 8k 100k 100 600 10
Meta Llama 3.1 8B 131k 131k 100k 100 600 10
Meta Llama 3.3 70B 131k 131k 100k 100 600 10
Meta Llama 4 Maverick 1M 1M 100k 100 600 10
Meta Llama 4 Scout 131k 131k 100k 100 600 10
Mistral NeMo 131k 131k 100k 100 600 10
Qwen2.5-VL 7B 32k 32k 100k 100 600 10
Qwen3-235B-A22B 40k 40k 100k 100 600 10
Model Context size
[tokens]
Max output
[tokens]
Max batch
requests
Concurrent
requests
Requests
per minute
Hosted fine-tuned
models
DeepSeek-R1 163k 163k 500k 100 1200 25
DeepSeek-R1-0528 163k 163k 500k 100 1200 25
DeepSeek-V3-0324 163k 163k 500k 100 1200 25
Gemma 3 27B 64k 8k 500k 100 1200 25
Meta Llama 3.1 8B 131k 131k 500k 100 1200 25
Meta Llama 3.3 70B 131k 131k 500k 100 1200 25
Meta Llama 4 Maverick 1M 1M 500k 100 1200 25
Meta Llama 4 Scout 131k 131k 500k 100 1200 25
Mistral NeMo 131k 131k 500k 100 1200 25
Qwen2.5-VL 7B 32k 32k 500k 100 1200 25
Qwen3-235B-A22B 40k 40k 500k 100 1200 25
Model Context size
[tokens]
Max output
[tokens]
Max batch
requests
Concurrent
requests
Requests
per minute
Hosted fine-tuned
models
DeepSeek-R1 163k 163k Unlimited 100 Unlimited Unlimited
DeepSeek-R1-0528 163k 163k Unlimited 100 Unlimited Unlimited
DeepSeek-V3-0324 163k 163k Unlimited 100 Unlimited Unlimited
Gemma 3 27B 64k 8k Unlimited 100 Unlimited Unlimited
Meta Llama 3.1 8B 131k 131k Unlimited 100 Unlimited Unlimited
Meta Llama 3.3 70B 131k 131k Unlimited 100 Unlimited Unlimited
Meta Llama 4 Maverick 1M 1M Unlimited 100 Unlimited Unlimited
Meta Llama 4 Scout 131k 131k Unlimited 100 Unlimited Unlimited
Mistral NeMo 131k 131k Unlimited 100 Unlimited Unlimited
Qwen2.5-VL 7B 32k 32k Unlimited 100 Unlimited Unlimited
Qwen3-235B-A22B 40k 40k Unlimited 100 Unlimited Unlimited