Models on kluster.ai#
kluster.ai offers a wide variety of open-source models for both real-time and batch inferences, with more being constantly added.
This page covers all the models the API supports, with the API request limits for each.
Model names#
Each model supported by kluster.ai has a unique name that must be used when defining the model
in the request.
Model | Model API name |
---|---|
DeepSeek R1 | deepseek-ai/DeepSeek-R1 |
DeepSeek V3 | deepseek-ai/DeepSeek-V3 |
DeepSeek V3 0324 | deepseek-ai/DeepSeek-V3-0324 |
Gemma 3 27B | google/gemma-3-27b-it |
Llama 3.1 8B | klusterai/Meta-Llama-3.1-8B-Instruct-Turbo |
Llama 3.1 405B | klusterai/Meta-Llama-3.1-405B-Instruct-Turbo |
Llama 3.3 70B | klusterai/Meta-Llama-3.3-70B-Instruct-Turbo |
Llama 4 Maverick 17B 128E | meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8 |
Llama 4 Scout 17B 16E | meta-llama/Llama-4-Scout-17B-16E-Instruct |
Llama 3.3 70B | klusterai/Meta-Llama-3.3-70B-Instruct-Turbo |
Qwen 2.5 7B | Qwen/Qwen2.5-VL-7B-Instruct |
Model comparison table#
Model | Main use case |
Real-time inference support |
Batch inference support |
Fine-tuning support |
Image analysis |
Function calling |
---|---|---|---|---|---|---|
DeepSeek R1 | Code generation Complex data analysis |
|||||
DeepSeek V3 | Natural language generation Contextually rich writing |
|||||
DeepSeek V3 0324 | Natural language generation Contextually rich writing |
|||||
Gemma 3 27B | Multilingual applications Image analysis Complex reasoning |
|||||
Llama 3.1 8B | Low-latency or simple tasks Cost-efficient inference |
|||||
Llama 3.1 405B | Detailed analysis Maximum accuracy |
|||||
Llama 3.3 70B | General-purpose AI Balanced cost-performance |
|||||
Llama 4 Maverick 17B 128E | Advanced multimodal reasoning Long-context, high-accuracy tasks |
|||||
Llama 4 Scout 17B 16E | Efficient multimodal performance Extended context, general tasks |
|||||
Qwen 2.5 7B | Document analysis Image-based reasoning Multimodal chat |
API request limits#
The following limits apply to API requests based on your plan:
Model | Context size [tokens] |
Max output [tokens] |
Max batch requests |
Concurrent requests |
Requests per minute |
Hosted fine-tuned models |
---|---|---|---|---|---|---|
DeepSeek R1 | 32k | 4k | 1000 | 20 | 30 | 1 |
DeepSeek V3 | 32k | 4k | 1000 | 20 | 30 | 1 |
DeepSeek V3 0324 | 32k | 4k | 1000 | 20 | 30 | 1 |
Gemma 3 27B | 32k | 4k | 1000 | 20 | 30 | 1 |
Llama 3.1 8B | 32k | 4k | 1000 | 20 | 30 | 1 |
Llama 3.1 405B | 32k | 4k | 1000 | 20 | 30 | 1 |
Llama 3.3 70B | 32k | 4k | 1000 | 20 | 30 | 1 |
Llama 4 Maverick 17B 128E | 32k | 4k | 1000 | 20 | 30 | 1 |
Llama 4 Scout 17B 16E | 32k | 4k | 1000 | 20 | 30 | 1 |
Qwen 2.5 7B | 32k | 4k | 1000 | 20 | 30 | 1 |
Model | Context size [tokens] |
Max output [tokens] |
Max batch requests |
Concurrent requests |
Requests per minute |
Hosted fine-tuned models |
---|---|---|---|---|---|---|
DeepSeek R1 | 164k | 164k | 100k | 100 | 600 | 10 |
DeepSeek V3 | 164k | 164k | 100k | 100 | 600 | 10 |
DeepSeek V3 0324 | 164k | 164k | 100k | 100 | 600 | 10 |
Gemma 3 27B | 64k | 8k | 100k | 100 | 600 | 10 |
Llama 3.1 8B | 131k | 131k | 100k | 100 | 600 | 10 |
Llama 3.1 405B | 131k | 131k | 100k | 100 | 600 | 10 |
Llama 3.3 70B | 131k | 131k | 100k | 100 | 600 | 10 |
Llama 4 Maverick 17B 128E | 1M | 1M | 100k | 100 | 600 | 10 |
Llama 4 Scout 17B 16E | 131k | 131k | 100k | 100 | 600 | 10 |
Qwen 2.5 7B | 32k | 32k | 100k | 100 | 600 | 10 |
Model | Context size [tokens] |
Max output [tokens] |
Max batch requests |
Concurrent requests |
Requests per minute |
Hosted fine-tuned models |
---|---|---|---|---|---|---|
DeepSeek R1 | 164k | 164k | 500k | 100 | 1200 | 25 |
DeepSeek V3 | 164k | 164k | 500k | 100 | 1200 | 25 |
DeepSeek V3 0324 | 164k | 164k | 500k | 100 | 1200 | 25 |
Gemma 3 27B | 64k | 8k | 500k | 100 | 1200 | 25 |
Llama 3.1 8B | 131k | 131k | 500k | 100 | 1200 | 25 |
Llama 3.1 405B | 131k | 131k | 500k | 100 | 1200 | 25 |
Llama 3.3 70B | 131k | 131k | 500k | 100 | 1200 | 25 |
Llama 4 Maverick 17B 128E | 1M | 1M | 500k | 100 | 1200 | 25 |
Llama 4 Scout 17B 16E | 131k | 131k | 500k | 100 | 1200 | 25 |
Qwen 2.5 7B | 32k | 32k | 500k | 100 | 1200 | 25 |
Model | Context size [tokens] |
Max output [tokens] |
Max batch requests |
Concurrent requests |
Requests per minute |
Hosted fine-tuned models |
---|---|---|---|---|---|---|
DeepSeek R1 | 164k | 164k | Unlimited | 100 | Unlimited | Unlimited |
DeepSeek V3 | 164k | 164k | Unlimited | 100 | Unlimited | Unlimited |
DeepSeek V3 0324 | 164k | 164k | Unlimited | 100 | Unlimited | Unlimited |
Gemma 3 27B | 64k | 8k | Unlimited | 100 | Unlimited | Unlimited |
Llama 3.1 8B | 131k | 131k | Unlimited | 100 | Unlimited | Unlimited |
Llama 3.1 405B | 131k | 131k | Unlimited | 100 | Unlimited | Unlimited |
Llama 3.3 70B | 131k | 131k | Unlimited | 100 | Unlimited | Unlimited |
Llama 4 Maverick 17B 128E | 1M | 1M | Unlimited | 100 | Unlimited | Unlimited |
Llama 4 Scout 17B 16E | 131k | 131k | Unlimited | 100 | Unlimited | Unlimited |
Qwen 2.5 7B | 32k | 32k | Unlimited | 100 | Unlimited | Unlimited |