Skip to content

Models on kluster.ai

kluster.ai offers a wide variety of open-source models for both real-time and batch inferences, with more being constantly added.

This page covers all the models the API supports, with the API request limits for each.

Model names

Each model supported by kluster.ai has a unique name that must be used when defining the model in the request.

Model Model API name
DeepSeek R1 deepseek-ai/DeepSeek-R1
DeepSeek V3 deepseek-ai/DeepSeek-V3
DeepSeek V3 0324 deepseek-ai/DeepSeek-V3-0324
Gemma 3 27B google/gemma-3-27b-it
Llama 3.1 8B klusterai/Meta-Llama-3.1-8B-Instruct-Turbo
Llama 3.1 405B klusterai/Meta-Llama-3.1-405B-Instruct-Turbo
Llama 3.3 70B klusterai/Meta-Llama-3.3-70B-Instruct-Turbo
Llama 4 Maverick 17B 128E meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8
Llama 4 Scout 17B 16E meta-llama/Llama-4-Scout-17B-16E-Instruct
Llama 3.3 70B klusterai/Meta-Llama-3.3-70B-Instruct-Turbo
Qwen 2.5 7B Qwen/Qwen2.5-VL-7B-Instruct

Model comparison table

Model Main
use case
Real-time
inference support
Batch
inference support
Fine-tuning
support
Image
analysis
Function
calling
DeepSeek R1 Code generation
Complex data analysis
✅ ✅ ❌ ❌ ❌
DeepSeek V3 Natural language generation
Contextually rich writing
✅ ✅ ❌ ❌ ❌
DeepSeek V3 0324 Natural language generation
Contextually rich writing
✅ ✅ ❌ ❌ ❌
Gemma 3 27B Multilingual applications
Image analysis
Complex reasoning
✅ ✅ ❌ ✅ ❌
Llama 3.1 8B Low-latency or simple tasks
Cost-efficient inference
✅ ✅ ✅ ❌ ✅
Llama 3.1 405B Detailed analysis
Maximum accuracy
✅ ✅ ❌ ❌ ✅
Llama 3.3 70B General-purpose AI
Balanced cost-performance
✅ ✅ ✅ ❌ ✅
Llama 4 Maverick 17B 128E Advanced multimodal reasoning
Long-context, high-accuracy tasks
✅ ✅ ❌ ✅ ❌
Llama 4 Scout 17B 16E Efficient multimodal performance
Extended context, general tasks
✅ ✅ ❌ ✅ ❌
Qwen 2.5 7B Document analysis
Image-based reasoning
Multimodal chat
✅ ✅ ❌ ✅ ❌

API request limits

The following limits apply to API requests based on your plan:

Model Context size
[tokens]
Max output
[tokens]
Max batch
requests
Concurrent
requests
Requests
per minute
Hosted fine-tuned
models
DeepSeek R1 32k 4k 1000 20 30 1
DeepSeek V3 32k 4k 1000 20 30 1
DeepSeek V3 0324 32k 4k 1000 20 30 1
Gemma 3 27B 32k 4k 1000 20 30 1
Llama 3.1 8B 32k 4k 1000 20 30 1
Llama 3.1 405B 32k 4k 1000 20 30 1
Llama 3.3 70B 32k 4k 1000 20 30 1
Llama 4 Maverick 17B 128E 32k 4k 1000 20 30 1
Llama 4 Scout 17B 16E 32k 4k 1000 20 30 1
Qwen 2.5 7B 32k 4k 1000 20 30 1
Model Context size
[tokens]
Max output
[tokens]
Max batch
requests
Concurrent
requests
Requests
per minute
Hosted fine-tuned
models
DeepSeek R1 164k 164k 100k 100 600 10
DeepSeek V3 164k 164k 100k 100 600 10
DeepSeek V3 0324 164k 164k 100k 100 600 10
Gemma 3 27B 64k 8k 100k 100 600 10
Llama 3.1 8B 131k 131k 100k 100 600 10
Llama 3.1 405B 131k 131k 100k 100 600 10
Llama 3.3 70B 131k 131k 100k 100 600 10
Llama 4 Maverick 17B 128E 1M 1M 100k 100 600 10
Llama 4 Scout 17B 16E 131k 131k 100k 100 600 10
Qwen 2.5 7B 32k 32k 100k 100 600 10
Model Context size
[tokens]
Max output
[tokens]
Max batch
requests
Concurrent
requests
Requests
per minute
Hosted fine-tuned
models
DeepSeek R1 164k 164k 500k 100 1200 25
DeepSeek V3 164k 164k 500k 100 1200 25
DeepSeek V3 0324 164k 164k 500k 100 1200 25
Gemma 3 27B 64k 8k 500k 100 1200 25
Llama 3.1 8B 131k 131k 500k 100 1200 25
Llama 3.1 405B 131k 131k 500k 100 1200 25
Llama 3.3 70B 131k 131k 500k 100 1200 25
Llama 4 Maverick 17B 128E 1M 1M 500k 100 1200 25
Llama 4 Scout 17B 16E 131k 131k 500k 100 1200 25
Qwen 2.5 7B 32k 32k 500k 100 1200 25
Model Context size
[tokens]
Max output
[tokens]
Max batch
requests
Concurrent
requests
Requests
per minute
Hosted fine-tuned
models
DeepSeek R1 164k 164k Unlimited 100 Unlimited Unlimited
DeepSeek V3 164k 164k Unlimited 100 Unlimited Unlimited
DeepSeek V3 0324 164k 164k Unlimited 100 Unlimited Unlimited
Gemma 3 27B 64k 8k Unlimited 100 Unlimited Unlimited
Llama 3.1 8B 131k 131k Unlimited 100 Unlimited Unlimited
Llama 3.1 405B 131k 131k Unlimited 100 Unlimited Unlimited
Llama 3.3 70B 131k 131k Unlimited 100 Unlimited Unlimited
Llama 4 Maverick 17B 128E 1M 1M Unlimited 100 Unlimited Unlimited
Llama 4 Scout 17B 16E 131k 131k Unlimited 100 Unlimited Unlimited
Qwen 2.5 7B 32k 32k Unlimited 100 Unlimited Unlimited